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Preface to the Second Edition 



J. irst published in 1995, Wavelets and Subband Coding has, in our opinion, filled 
a useful need in explaining a new view of signal processing based on flexible time- 
frequency analysis and its applications. The book has been well received and used 
by researchers and engineers alike. In addition, it was also used as a textbook for 
graduate courses at several leading universities. 

So what has changed drastically in the last 12 years? The field has matured, 
the teaching of these techniques is more widespread, and publication practices have 
evolved. Specifically, the World Wide Web, which was in its infancy a dozen years 
ago, is now a major communications medium. Thus, in agreement with our origi- 
nal publisher, Prentice-Hall, we now retain the copyright, and we have decided to 
allow open access to the book online (protected under the by-nc-nd license from 
Creative Commons). In addition, the solutions manual, prepared by S. G. Chang, 
M. M. Goodwin, V. K Goyal and T. Kalker, is also available upon request for 
teachers using the book. 

We thus hope the book continues to play a useful role while getting a wider 
distribution. Enjoy it! 

Martin Vetterli Jelena Kovacevic 

Grandvaux New York City 
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Preface 



A 



central goal of signal processing is to describe real life signals, be it for com- 
putation, compression, or understanding. In that context, transforms or linear ex- 
pansions have always played a key role. Linear expansions are present in Fourier's 
original work and in Haar's construction of the first wavelet, as well as in Gabor's 
work on time- frequency analysis. Today, transforms are central in fast algorithms 
such as the FFT as well as in applications such as image and video compression. 

Over the years, depending on open problems or specific applications, theoreti- 
cians and practitioners have added more and more tools to the toolbox called signal 
processing. Two of the newest additions have been wavelets and their discrete- 
time cousins, filter banks or subband coding. From work in harmonic analysis and 
mathematical physics, and from applications such as speech/image compression 
and computer vision, various disciplines built up methods and tools with a similar 
flavor, which can now be cast into the common framework of wavelets. 

This unified view, as well as the number of applications where this framework 
is useful, are motivations for writing this book. The unification has given a new 
understanding and a fresh view of some classic signal processing problems. Another 
motivation is that the subject is exciting and the results are cute! 

The aim of the book is to present this unified view of wavelets and subband 
coding. It will be done from a signal processing perspective, but with sufficient 
background material such that people without signal processing knowledge will 

xiii 
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find it useful as well. The level is that of a first year graduate engineering book 
(typically electrical engineering and computer sciences), but elementary Fourier 
analysis and some knowledge of linear systems in discrete time are enough to follow 
most of the book. 

After the introduction (Chapter 1) and a review of the basics of vector spaces, 
linear algebra, Fourier theory and signal processing (Chapter 2), the book covers 
the five main topics in as many chapters. The discrete-time case, or filter banks, 
is thoroughly developed in Chapter 3. This is the basis for most applications, as 
well as for some of the wavelet constructions. The concept of wavelets is developed 
in Chapter 4, both with direct approaches and based on filter banks. This chapter 
describes wavelet series and their computation, as well as the construction of mod- 
ified local Fourier transforms. Chapter 5 discusses continuous wavelet and local 
Fourier transforms, which are used in signal analysis, while Chapter 6 addresses 
efficient algorithms for filter banks and wavelet computations. Finally, Chapter 7 
describes signal compression, where filter banks and wavelets play an important 
role. Speech/audio, image and video compression using transforms, quantization 
and entropy coding are discussed in detail. Throughout the book we give examples 
to illustrate the concepts, and more technical parts are left to appendices. 

This book evolved from class notes used at Columbia University and the Uni- 
versity of California at Berkeley. Parts of the manuscript have also been used at the 
University of Illinois at Urbana- Champaign and the University of Southern Cali- 
fornia. The material was covered in a semester, but it would also be easy to carve 
out a subset or skip some of the more mathematical subparts when developing a 
curriculum. For example, Chapters 3, 4 and 7 can form a good core for a course in 
Wavelets and Subband Coding. Homework problems are included in all chapters, 
complemented with project suggestions in Chapter 7. Since there is a detailed re- 
view chapter that makes the material as self-contained as possible, we think that 
the book is useful for self-study as well. 

The subjects covered in this book have recently been the focus of books, special 
issues of journals, special conference proceedings, numerous articles and even new 
journals! To us, the book by I. Daubechies [73] has been invaluable, and Chapters 4 
and 5 have been substantially influenced by it. Like the standard book by Meyer 
[194] and a recent book by Chui [49], it is a more mathematically oriented book 
than the present text. Another, more recent, tutorial book by Meyer gives an 
excellent overview of the history of the subject, its mathematical implications and 
current applications [195]. On the engineering side, the book by Vaidyanathan 
[308] is an excellent reference on filter banks, as is Malvar's book [188] for lapped 
orthogonal transforms and compression. Several other texts, including edited books, 
have appeared on wavelets [27, 51, 251], as well as on subband coding [335] and 
multiresolution signal decompositions [3]. Recent tutorials on wavelets can be found 
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in [128, 140, 247, 281], and on filter banks in [305, 307]. 

From the above, it is obvious that there is no lack of literature, yet we hope 
to provide a text with a broad coverage of theory and applications and a different 
perspective based on signal processing. We enjoyed preparing this material, and 
simply hope that the reader will find some pleasure in this exciting subject, and 
share some of our enthusiasm! 
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Wavelets, Filter Banks and Multi resolution 
Signal Processing 



"It is with logic that one proves; 

it is with intuition that one invents. " 

— Henri Poincare 



_l_ he topic of this book is very old and very new. Fourier series, or expansion of 
periodic functions in terms of harmonic sines and cosines, date back to the early 
part of the 19th century when Fourier proposed harmonic trigonometric series [100]. 
The first wavelet (the only example for a long time!) was found by Haar early in 
this century [126]. But the construction of more general wavelets to form bases 
for square-integrable functions was investigated in the 1980's, along with efficient 
algorithms to compute the expansion. At the same time, applications of these 
techniques in signal processing have blossomed. 

While linear expansions of functions are a classic subject, the recent construc- 
tions contain interesting new features. For example, wavelets allow good resolution 
in time and frequency, and should thus allow one to see "the forest and the trees." 
This feature is important for nonstationary signal analysis. While Fourier basis 
functions are given in closed form, many wavelets can only be obtained through a 
computational procedure (and even then, only at specific rational points). While 
this might seem to be a drawback, it turns out that if one is interested in imple- 
menting a signal expansion on real data, then a computational procedure is better 
than a closed- form expression! 



2 CHAPTER 1 

The recent surge of interest in the types of expansions discussed here is due 
to the convergence of ideas from several different fields, and the recognition that 
techniques developed independently in these fields could be cast into a common 
framework. 

The name "wavelet" had been used before in the literature, 1 but its current 
meaning is due to J. Goupillaud, J. Morlet and A. Grossman [119, 125]. In the 
context of geophysical signal processing they investigated an alternative to local 
Fourier analysis based on a single prototype function, and its scales and shifts. 
The modulation by complex exponentials in the Fourier transform is replaced by a 
scaling operation, and the notion of scale 2 replaces that of frequency. The simplicity 
and elegance of the wavelet scheme was appealing and mathematicians started 
studying wavelet analysis as an alternative to Fourier analysis. This led to the 
discovery of wavelets which form orthonormal bases for square-integrable and other 
function spaces by Meyer [194], Daubechies [71], Battle [21, 22], Lemarie [175], 
and others. A formalization of such constructions by Mallat [180] and Meyer [194] 
created a framework for wavelet expansions called multiresolution analysis, and 
established links with methods used in other fields. Also, the wavelet construction 
by Daubechies is closely connected to filter bank methods used in digital signal 
processing as we shall see. 

Of course, these achievements were preceded by a long-term evolution from the 
1910 Haar wavelet (which, of course, was not called a wavelet back then) to work 
using octave division of the Fourier spectrum (Littlewood-Paley) and results in 
harmonic analysis (Calderon-Zygmund operators). Other constructions were not 
recognized as leading to wavelets initially (for example, Stromberg's work [283]). 

Paralleling the advances in pure and applied mathematics were those in signal 
processing, but in the context of discrete-time signals. Driven by applications such 
as speech and image compression, a method called subband coding was proposed by 
Croisier, Esteban, and Galand [69] using a special class of filters called quadrature 
mirror filters (QMF) in the late 1970's, and by Crochiere, Webber and Flanagan 
[68]. This led to the study of perfect reconstruction filter banks, a problem solved 
in the 1980's by several people, including Smith and Barnwell [270, 271], Mintzer 
[196], Vetterli [315], and Vaidyanathan [306]. 

In a particular configuration, namely when the filter bank has octave bands, 
one obtains a discrete-time wavelet series. Such a configuration has been popular 
in signal processing less for its mathematical properties than because an octave 
band or logarithmic spectrum is more natural for certain applications such as audio 



For example, for the impulse response of a layer in geophysical signal processing by Ricker 
[237] and for a causal finite-energy function by Robinson [248]. 

2 For a beautiful illustration of the notion of scale, and an argument for geometric spacing of 
scale in natural imagery, see [197]. 
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compression since it emulates the hearing process. Such an octave-band filter bank 
can be used, under certain conditions, to generate wavelet bases, as shown by 
Daubechies [71]. 

In computer vision, multiresolution techniques have been used for various prob- 
lems, ranging from motion estimation to object recognition [249]. Images are suc- 
cessively approximated starting from a coarse version and going to a fine-resolution 
version. In particular, Burt and Adelson proposed such a scheme for image coding 
in the early 1980's [41], calling it pyramid coding. 3 This method turns out to be 
similar to subband coding. Moreover, the successive approximation view is similar 
to the multiresolution framework used in the analysis of wavelet schemes. 

In computer graphics, a method called successive refinement iteratively inter- 
polates curves or surfaces, and the study of such interpolators is related to wavelet 
constructions from filter banks [45, 92]. 

Finally, many computational procedures use the concept of successive approxi- 
mation, sometimes alternating between fine and coarse resolutions. The multigrid 
methods used for the solution of partial differential equations [39] are an example. 

While these interconnections are now clarified, this has not always been the 
case. In fact, maybe one of the biggest contributions of wavelets has been to bring 
people from different fields together, and from that cross fertilization and exchange 
of ideas and methods, progress has been achieved in various fields. 

In what follows, we will take mostly a signal processing point of view of the 
subject. Also, most applications discussed later are from signal processing. 

1 .1 Series Expansions of Signals 

We are considering linear expansions of signals or functions. That is, given any 
signal x from some space S, where S can be finite-dimensional (for example, 7Z n , 
C n ) or infinite-dimensional (for example, li(Z\ L2(JZ)), we want to find a set 
of elementary signals {ipi}i^z for that space so that we can write x as a linear 
combination 

x = ^ai ifi. (1-1-1) 

i 
The set {(pi} is complete for the space S, if all signals x £ S can be expanded as in 
(1.1.1). In that case, there will also exist a dual set {(fi}i£Z such that the expansion 
coefficients in (1.1.1) can be computed as 

<Xi = y~]<Pi[n] x[n], 



The importance of the pyramid algorithm was not immediately recognized. One of the review- 
ers of the original Burt and Adelson paper said, "I suspect that no one will ever use this algorithm 
again." 
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Figure 1.1 Examples of possible sets of vectors for the expansion of R 2 . (a) 
Orthonormal case, (b) Biorthogonal case, (c) Overcomplete case. 



when x and (pi are real discrete-time sequences, and 



a. 



<Pi(t) x(t) dt, 



when they are real continuous-time functions. The above expressions are the inner 
products of the </Vs with the signal x, denoted by (cpi,x). An important particular 
case is when the set {</?»} is orthonormal and complete, since then we have an 
orthonormal basis for S and the basis and its dual are the same, that is, ipi = ipi. 
Then 

(<Pi,<Pj) = S[i-j], 

where 5[i] equals 1 if i = 0, and otherwise. If the set is complete and the vectors 
tpi are linearly independent but not orthonormal, then we have a biorthogonal basis, 
and the basis and its dual satisfy 

(ipi,ipj) = 5[i-j}. 

If the set is complete but redundant (the (fi's are not linearly independent), then we 
do not have a basis but an overcomplete representation called a frame. To illustrate 
these concepts, consider the following example. 



Example 1.1 Set of Vectors for the Plane 

We show in Figure 1.1 some possible sets of vectors for the expansion of the plane, or 1Z 2 . 
The standard Euclidean basis is given by eo and ei. In part (a), an orthonormal basis is 
given by ip = [1, 1] T /V2 and ip\ — [1, —l] T /\/~2. The dual basis is identical, or (fi — tp im In 
part (b), a biorthogonal basis is given, with <^o = eo and <pi = [1, 1] . The dual basis is now 
(pa — [1, — 1] T and ipi — [0, 1] T . Finally, in part (c), an overcomplete set is given, namely 
ipo = [1,0] T , <fi! = [-1/2, V3/2] T and ip 2 = [-1/2, -V3/2] T . Then, it can be verified that 
a possible reconstruction basis is identical (up to a scale factor), namely, (pi — 2/3 tfi (the 
reconstruction basis is not unique). This set behaves as an orthonormal basis, even though 
the vectors are linearly dependent. 
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The representation in (1.1.1) is a change of basis, or, conceptually, a change 
of point of view. The obvious question is, what is a good basis {<£>«} for 5? The 
answer depends on the class of signals we want to represent, and on the choice 
of a criterion for quality. However, in general, a good basis is one that allows 
compact representation or less complex processing. For example, the Karhunen- 
Loeve transform concentrates as much energy in as few coefficients as possible, and 
is thus good for compression, while, for the implementation of convolution, the 
Fourier basis is computationally more efficient than the standard basis. 

We will be interested mostly in expansions with some structure, that is, expan- 
sions where the various basis vectors are related to each other by some elementary 
operations such as shifting in time, scaling, and modulation (which is shifting in 
frequency). Because we are concerned with expansions for very high- dimensional 
spaces (possibly infinite), bases without such structure are useless for complexity 
reasons. 

Historically, the Fourier series for periodic signals is the first example of a signal 
expansion. The basis functions are harmonic sines and cosines. Is this a good set 
of basis functions for signal processing? Besides its obvious limitation to periodic 
signals, it has very useful properties, such as the convolution property which comes 
from the fact that the basis functions are eigenfunctions of linear time-invariant 
systems. The extension of the scheme to nonperiodic signals, 4 by segmentation and 
piecewise Fourier series expansion of each segment, suffers from artificial boundary 
effects and poor convergence at these boundaries (due to the Gibbs phenomenon). 

An attempt to create local Fourier bases is the Gabor transform or short-time 
Fourier transform (STFT). A smooth window is applied to the signal centered 
around t = uTq (where To is some basic time step), and a Fourier expansion is 
applied to the windowed signal. This leads to a time-frequency representation since 
we get an approximate information about the frequency content of the signal around 
the location uTq. Usually, frequency points spaced 2ir/To apart are used and we 
get a sampling of the time-frequency plane on a rectangular grid. The spectrogram 
is related to such a time- frequency analysis. Note that the functions used in the 
expansion are related to each other by shift in time and modulation, and that we 
obtain a linear frequency analysis. While the STFT has proven useful in signal 
analysis, there are no good orthonormal bases based on this construction. Also, 
a logarithmic frequency scale, or constant relative bandwidth, is often preferable 
to the linear frequency scale obtained with the STFT. For example, the human 
auditory system uses constant relative bandwidth channels (critical bands), and 
therefore, audio compression systems use a similar decomposition. 



4 The Fourier transform of nonperiodic signals is also possible. It is an integral transform rather 
than a series expansion and lacks any time locality. 
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Figure 1.2 Musical notation and orthonormal wavelet bases, (a) The western 
musical notation uses a logarithmic frequency scale with twelve halftones per 
octave. In this example, notes are chosen as in an orthonormal wavelet basis, 
with long low-pitched notes, and short high-pitched ones, (b) Corresponding 
time-domain functions. 



A popular alternative to the STFT is the wavelet transform. Using scales and 
shifts of a prototype wavelet, a linear expansion of a signal is obtained. Because the 
scales used are powers of an elementary scale factor (typically 2), the analysis uses 
a constant relative bandwidth (or, the frequency axis is logarithmic). The sampling 
of the time-frequency plane is now very different from the rectangular grid used in 
the STFT. Lower frequencies, where the bandwidth is narrow (that is, the basis 
functions are stretched in time) are sampled with a large time step, while high 
frequencies (which correspond to short basis functions) are sampled more often. In 
Figure 1.2, we give an intuitive illustration of this time- frequency trade-off, and 
relate it to musical notation which also uses a logarithmic frequency scale. 5 What 
is particularly interesting is that such a wavelet scheme allows good orthonormal 
bases whereas the STFT does not. 

In the discussions above, we implicitly assumed continuous-time signals. Of 
course there are discrete-time equivalents to all these results. A local analysis 
can be achieved using a block transform, where the sequence is segmented into 
adjacent blocks of N samples, and each block is individually transformed. As is to be 
expected, such a scheme is plagued by boundary effects, also called blocking effects. 
A more general expansion relies on filter banks, and can achieve both STFT-like 
analysis (rectangular sampling of the time-frequency plane) or wavelet-like analysis 
(constant relative bandwidth in frequency). Discrete-time expansions based on 
filter banks are not arbitrary, rather they are structured expansions. Again, for 



5 This is the standard western musical notation based on J.S. Bach's "Well Tempered Piano" 
Thus one could argue that wavelets were actually invented by J.S. Bach! 
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complexity reasons, it is useful to impose such a structure on the basis chosen 
for the expansion. For example, filter banks correspond to basis sequences which 
satisfy a block shift invariance property. Sometimes, a modulation constraint can 
also be added, in particular in STFT-like discrete-time bases. Because we are in 
discrete time, scaling cannot be done exactly (unlike in continuous time), but an 
approximate scaling property between basis functions holds for the discrete-time 
wavelet series. 

Interestingly, the relationship between continuous- and discrete-time bases runs 
deeper than just these conceptual similarities. One of the most interesting con- 
structions of wavelets is the one by Daubechies [71]. It relies on the iteration 
of a discrete-time filter bank so that, under certain conditions, it converges to a 
continuous-time wavelet basis. Furthermore, the multiresolution framework used 
in the analysis of wavelet decompositions automatically associates a discrete-time 
perfect reconstruction filter bank to any wavelet decomposition. Finally, the wave- 
let series decomposition can be computed with a filter bank algorithm. Therefore, 
especially in the wavelet type of a signal expansion, there is a very close interaction 
between discrete and continuous time. 

It is to be noted that we have focused on STFT and wavelet type of expansions 
mainly because they are now quite standard. However, there are many alternatives, 
for example the wavelet packet expansion introduced by Coifman and coworkers 
[62, 64], and generalizations thereof. The main ingredients remain the same: they 
are structured bases in discrete or continuous time, and they permit different time 
versus frequency resolution trade-offs. An easy way to interpret such expansions 
is in terms of their time-frequency tiling: each basis function has a region in the 
time-frequency plane where most of its energy is concentrated. Then, given a basis 
and the expansion coefficients of a signal, one can draw a tiling where the shading 
corresponds to the value of the expansion coefficient. 6 

Example 1.2 Different Time- Frequency Tilings 

Figure 1.3 shows schematically different possible expansions of a very simple discrete-time 
signal, namely a sine wave plus an impulse (see part (a)). It would be desirable to have 
an expansion that captures both the isolated impulse (or Dirac in time) and the isolated 
frequency component (or Dirac in frequency) . The first two expansions, namely the identity 
transform in part (b) and the discrete-time Fourier series 7 in part (c), isolate the time and 
frequency impulse, respectively, but not both. The local discrete-time Fourier series in part 
(d) achieves a compromise, by locating both impulses to a certain degree. The discrete-time 
wavelet series in part (e) achieves better localization of the time-domain impulse, without 
sacrificing too much of the frequency localization. However, a high-frequency sinusoid would 
not be well localized. This simple example indicates some of the trade-offs involved. 



6 Such tiling diagrams were used by Gabor [102], and he called an elementary tile a "logon." 
7 Discrete-time series expansions are often called discrete-time transforms, both in the Fourier 
and in the wavelet case. 
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Figure 1.3 Time-frequency tilings for a simple discrete-time signal [130]. (a) 
Sine wave plus impulse, (b) Expansion onto the identity basis, (c) Discrete- 
time Fourier series, (d) Local discrete-time Fourier series, (e) Discrete-time 
wavelet series. 



Note that the local Fourier transform and the wavelet transform can be used 
for signal analysis purposes. In that case, the goal is not to obtain orthonormal 
bases, but rather to characterize the signal from the transform. The local Fourier 
transform retains many of the characteristics of the usual Fourier transform with a 
localization given by the window function, which is thus constant at all frequencies 
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(this phenomenon can be seen already in Figure 1.3(d)). The wavelet, on the 
other hand, acts as a microscope, focusing on smaller time phenomenons as the 
scale becomes small (see Figure 1.3(e) to see how the impulse gets better localized 
at high frequencies). This behavior permits a local characterization of functions, 
which the Fourier transform does not. 8 

1.2 MULTIRESOLUTION CONCEPT 

A slightly different expansion is obtained with multiresolution pyramids since the 
expansion is actually redundant (the number of samples in the expansion is big- 
ger than in the original signal). However, conceptually, it is intimately related to 
subband and wavelet decompositions. The basic idea is successive approximation. 
A signal is written as a coarse approximation (typically a lowpass, subsampled 
version) plus a prediction error which is the difference between the original signal 
and a prediction based on the coarse version. Reconstruction is immediate: simply 
add back the prediction to the prediction error. The scheme can be iterated on the 
coarse version. It can be shown that if the lowpass filter meets certain constraints of 
orthogonality, then this scheme is identical to an oversampled discrete-time wavelet 
series. Otherwise, the successive approximation approach is still at least concep- 
tually identical to the wavelet decomposition since it performs a multiresolution 
analysis of the signal. 

A schematic diagram of a pyramid decomposition, with attached resulting im- 
ages, is shown in Figure 1.4. After the encoding, we have a coarse resolution image 
of half size, as well as an error image of full size (thus the redundancy). For appli- 
cations, the decomposition into a coarse resolution which gives an approximate but 
adequate version of the full image, plus a difference or detail image, is conceptually 
very important. 

Example 1 .3 Multiresolution Image Database 

Let us consider the following practical problem: Users want to access and retrieve electronic 
images from an image database using a computer network with limited bandwidth. Because 
the users have an approximate idea of which image they want, they will first browse through 
some images before settling on a target image [214]. Given the limited bandwidth, browsing 
is best done on coarse versions of the images which can be transmitted faster. Once an image 
is chosen, the residual can be sent. Thus, the scheme shown in Figure 1.4 can be used, where 
the coarse and residual images are further compressed to diminish the transmission time. 

The above example is just one among many schemes where multiresolution de- 
compositions are useful in communications problems. Others include transmission 



8 For example, in [137], this mathematical microscope is used to analyze some famous lacunary 
Fourier series that was proposed over a century ago. 
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Figure 1 .4 Pyramid decomposition of an image where encoding is shown on the 
left and decoding is shown on the right. The operators D and I correspond to 
decimation and interpolation operators, respectively. For example, D produces 
an TV/2 x TV/2 image from an TV x N original, while / interpolates an TV x N 
image based on an TV/2 x TV/2 original. 



over error-prone channels, where the coarse resolution can be better protected to 
guarantee some minimum level of quality. 

Multiresolution decompositions are also important for computer vision tasks 
such as image segmentation or object recognition: the task is performed in a suc- 
cessive approximation manner, starting on the coarse version and then using this 
result as an initial guess for the full task. However, this is a greedy approach which 
is sometimes suboptimal. Figure 1.5 shows a famous counter-example, where a 
multiresolution approach would be seriously misleading . . . 

Interestingly, the multiresolution concept, besides being intuitive and useful in 
practice, forms the basis of a mathematical framework for wavelets [181, 194]. As 
in the pyramid example shown in Figure 1.4, one can decompose a function into a 
coarse version plus a residual, and then iterate this to infinity. If properly done, 
this can be used to analyze wavelet schemes and derive wavelet bases. 



1 .3 Overview of the Book 

We start with a review of fundamentals in Chapter 2. This chapter should make 
the book as self-contained as possible. It reviews Hilbert spaces at an elementary 
but sufficient level, linear algebra (including matrix polynomials) and Fourier the- 



1.3. OVERVIEW OF THE BOOK 



11 




Figure 1 .5 Counter-example to multiresolution technique. The coarse approx- 
imation is unrelated to the full- resolution image (Comet Photo AG). 



ory, with material on sampling and discrete-time Fourier transforms in particular. 
The review of continuous-time and discrete-time signal processing is followed by 
a discussion of multirate signal processing, which is a topic central to later chap- 
ters. Finally, a short introduction to time-frequency distributions discusses the 
local Fourier transform and the wavelet transform, and shows the uncertainty prin- 
ciple. The appendix gives factorizations of unitary matrices, and reviews results on 
convergence and regularity of functions. 

Chapter 3 focuses on discrete-time bases and filter banks. This topic is impor- 
tant for several later chapters as well as for applications. We start with two simple 
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expansions which will reappear throughout the book as a recurring theme: the Haar 
and the sine bases. They are limit cases of orthonormal expansions with good time 
localization (Haar) and good frequency localization (sine). This naturally leads to 
an in-depth study of two-channel filter banks, including analytical tools for their 
analysis as well as design methods. The construction of orthonormal and linear 
phase filter banks is described. Multichannel filter banks are developed next, first 
through tree structures and then in the general case. Modulated filter banks, cor- 
responding conceptually to a discrete-time local Fourier analysis, are addressed as 
well. Next, pyramid schemes and overcomplete representations are explored. Such 
schemes, while not critically sampled, have some other attractive features, such 
as time invariance. Then, the multidimensional case is discussed both for simple 
separable systems, as well as for general nonseparable ones. The latter systems 
involve lattice sampling which is detailed in an appendix. Finally, filter banks for 
telecommunications, namely transmultip lexers and adaptive subband filtering, are 
presented briefly. The appendix details factorizations of orthonormal filter banks 
(corresponding to paraunitary matrices). 

Chapter 4 is devoted to the construction of bases for continuous-time signals, 
in particular wavelets and local cosine bases. Again, the Haar and sine cases play 
illustrative roles as extremes of wavelet constructions. After an introduction to 
series expansions, we develop multiresolution analysis as a framework for wavelet 
constructions. This naturally leads to the classic wavelets of Meyer and Battle- 
Lemarie or Stromberg. These are based on Fourier- domain analysis. This is followed 
by Daubechies' construction of wavelets from iterated filter banks. This is a time- 
domain construction based on the iteration of a multirate filter. Study of the 
iteration leads to the notion of regularity of the discrete-time filter. Then, the 
wavelet series expansion is considered both in terms of properties and computation 
of the expansion coefficients. Some generalizations of wavelet constructions are 
considered next, first in one dimension (including biorthogonal and multichannel 
wavelets) and then in multiple dimensions, where nonseparable wavelets are shown. 
Finally, local cosine bases are derived and they can be seen as a real-valued local 
Fourier transform. 

Chapter 5 is concerned with continuous wavelet and Fourier transforms. Unlike 
the series expansions in Chapters 3 and 4, these are very redundant representa- 
tions useful for signal analysis. Both transforms are analyzed, inverses are derived, 
and their main properties are given. These transforms can be sampled, that is, 
scale/frequency and time shift can be discretized. This leads to redundant series 
representations called frames. In particular, reconstruction or inversion is discussed, 
and the case of wavelet and local Fourier frames is considered in some detail. 

Chapter 6 treats algorithmic and computational aspects of series expansions. 
First, a review of classic fast algorithms for signal processing is given since they 
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form the ingredients used in subsequent algorithms. The key role of the fast Fourier 
transform (FFT) is pointed out. The complexity of computing filter banks, that is, 
discrete-time expansions, is studied in detail. Important cases include the discrete- 
time wavelet series or transform and modulated filter banks. The latter corresponds 
to a local discrete-time Fourier series or transform, and uses FFT's for efficient com- 
putation. These filter bank algorithms have direct applications in the computation 
of wavelet series. Overcomplete expansions are considered next, in particular for 
the computation of a sampled continuous wavelet transform. The chapter concludes 
with a discussion of special topics related to efficient convolution algorithms and 
also application of wavelet ideas to numerical algorithms. 

The last chapter is devoted to one of the main applications of wavelets and 
filter banks in signal processing, namely signal compression. The technique is often 
called subband coding because signals are considered in spectral bands for com- 
pression purposes. First comes a review of transform based compression, including 
quantization and entropy coding. Then follow specific discussions of one-, two- and 
three-dimensional signal compression methods based on transforms. Speech and 
audio compression, where subband coding was first invented, is discussed. The 
success of subband coding in current audio coding algorithms is shown on spe- 
cific examples such as the MUSICAM standard. A thorough discussion of image 
compression follows. While current standards such as JPEG are block transform 
based, some innovative subband or wavelet schemes are very promising and are 
described in detail. Video compression is considered next. Besides expansions, 
motion estimation/compensation methods play a key role and are discussed. The 
multiresolution feature inherent in pyramid and subband coding is pointed out as 
an attractive feature for video compression, just as it is for image coding. The final 
section discusses the interaction of source coding, particularly the multiresolution 
type, and channel coding or transmission. This joint source-channel coding is key 
to new applications of image and video compression, as in transmission over packet 
networks. An appendix gives a brief review of statistical signal processing which 
underlies coding methods. 
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Fundamentals of Signal Decompositions 



"A journey of a thousand miles 

must begin with a single step. " 

- Lao-Tzu, Tao Te Ching 



_l_ he mathematical framework necessary for our later developments is established 
in this chapter. While we review standard material, we also cover the broad spec- 
trum from Hilbert spaces and Fourier theory to signal processing and time- frequency 
distributions. Furthermore, the review is done from the point of view of the chap- 
ters to come, namely, signal expansions. This chapter attempts to make the book 
as self-contained as possible. 

We tried to keep the level of formalism reasonable, and refer to standard texts for 
many proofs. While this chapter may seem dry, basic mathematics is the foundation 
on which the rest of the concepts are built, and therefore, some solid groundwork 
is justified. 

After defining notations, we discuss Hilbert spaces. In their finite-dimensional 
form, Hilbert spaces are familiar to everyone. Their infinite-dimensional counter- 
parts, in particular L2(7Z) and foi^), are derived, since they are fundamental to 
signal processing in general and to our developments in particular. Linear opera- 
tors on Hilbert spaces and (in finite dimensions) linear algebra are discussed briefly. 
The key ideas of orthonormal bases, orthogonal projection and best approximation 
are detailed, as well as general bases and overcomplete expansions, or, frames. 

We then turn to a review of Fourier theory which starts with the Fourier trans- 
form and series. The expansion of bandlimited signals and sampling naturally lead 
to the discrete-time Fourier transform and series. 

15 
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Next comes a brief review of continuous-time and discrete-time signal process- 
ing, followed by a discussion of multirate discrete-time signal processing. It should 
be emphasized that this last topic is central to the rest of the book, but not often 
treated in standard signal processing books. 

Finally, we review time- frequency representations, in particular short-time Fourier 
or Gabor expansions as well as the newer wavelet expansion. We also discuss the 
uncertainty relation, which is a fundamental limit in linear time-frequency repre- 
sentations. A bilinear expansion, the Wigner-Ville transform, is also introduced. 

2.1 Notations 

Let C, 7Z, Z and ftf denote the sets of complex, real, integer and natural numbers, 
respectively. Then, C n , and lZ n will be the sets of all n-tuples (xi, . . . ,x n ) of 
complex and real numbers, respectively. 

The superscript * denotes complex conjugation, or, (a + jb)* = (a — jb), where 
the symbol j is used for the square root of —1 and a, b £ 1Z. The subscript * is used 
to denote complex conjugation of the constants but not the complex variable, for 
example, (az)* = a* z where z is a complex variable. The superscript T denotes the 
transposition of a vector or a matrix, while the superscript * on a vector or matrix 
denotes hermitian transpose, or transposition and complex conjugation. Re{z) and 
Im{z) denote the real and imaginary parts of the complex number z. 

We define the iVth root of unity as Wn = e _j27r ' . It satisfies the following: 

Wg = 1, (2.1.1) 

Wjf +i = W l N , with k,i in Z, (2.1.2) 

Yw k N n = (^ n = lN,l€Z, 

z — ' otherwise. 

k=0 y - 

The last relation is often referred to as orthogonality of the roots of unity. 

Often we deal with functions of a continuous variable, and a related sequence 
indexed by an integer (typically, the latter is a sampled version of the former). To 
avoid confusion, and in keeping with the tradition of the signal processing litera- 
ture [211], we use parentheses around a continuous variable and brackets around a 
discrete one, for example, f(t) and x[n], where 

x[n] = finT), ne Z, T e K. 

In particular, 8{t) and 5[n] denote continuous-time and discrete-time Dirac func- 
tions, which are very different indeed. The former is a generalized function (see 
Section 2.4.4) while the latter is the sequence which is 1 for n = and otherwise 
(the Dirac functions are also called delta or impulse functions). 
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In discrete-time signal processing, we will often encounter 27r-periodic functions 
(namely, discrete-time Fourier transforms of sequences, see Section 2.4.6), and we 
will write, for example, H{e :)u ) to make the periodicity explicit. 

2.2 Hilbert Spaces 

Finite-dimensional vector spaces, as studied in linear algebra [106, 280], involve 
vectors over 7Z or C that are of finite dimension n. Such spaces are denoted by TZ n 
and C n , respectively. Given a set of vectors, {vk}, in TV 1 or C n , important questions 
include: 

(a) Does the set {v^} span the space TV 1 or C n , that is, can every vector in 7Z n or 
C n be written as a linear combination of vectors from {v/-}? 

(b) Are the vectors linearly independent, that is, is it true that no vector from 
{vk} can be written as a linear combination of the others? 

(c) How can we find bases for the space to be spanned, in particular, orthonormal 
bases? 

(d) Given a subspace of TV 1 or C n and a general vector, how can we find an 
approximation in the least-squares sense, (see below) that lies in the subspace? 

Two key notions used in addressing these questions include: 

(a) The length, or norm, 1 of a vector (we take TZ n as an example), 

1/2 
X 




(b) The orthogonality of a vector with respect to another vector (or set of vectors), 
for example, 

(x,y) = 0, 

with an appropriately defined scalar product, 



(x,y) = ^Xiyi. 



i=\ 



So far, we relied on the fact that the spaces were finite-dimensional. Now, the idea 
is to generalize our familiar notion of a vector space to infinite dimensions. It is 



Unless otherwise specified, we will assume a squared norm. 
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necessary to restrict the vectors to have finite length or norm (even though they 
are infinite-dimensional). This leads naturally to Hilbert spaces. For example, the 
space of square-summable sequences, denoted by fo(Z), is the vector space "C°°" 
with a norm constraint. An example of a set of vectors spanning h(Z) is the set 
{5[n — k]}, k G Z. A further extension with respect to linear algebra is that vectors 
can be generalized from n-tuples of real or complex values to include functions of 
a continuous variable. The notions of norm and orthogonality can be extended to 
functions using a suitable inner product between functions, which are thus viewed 
as vectors. A classic example of such orthogonal vectors is the set of harmonic sine 
and cosine functions, sin(nt) and cos(nt), n = 0, 1, . . . , on the interval [—it, it]. 

The classic questions from linear algebra apply here as well. In particular, the 
question of completeness, that is, whether the span of the set of vectors {v^} covers 
the whole space, becomes more involved than in the finite-dimensional case. The 
norm plays a central role, since any vector in the space must be expressed by a 
linear combination of v^'s such that the norm of the difference between the vector 
and the linear combination of v^'s is zero. For li{£), {5[n — k]}, k G Z, constitute 
a complete set which is actually an orthonormal basis. For the space of square- 
integrable functions over the interval [— IT, it], denoted by L2{[— it, it]), the harmonic 
sines and cosines are complete since they form the basis used in the Fourier series 
expansion. 

If only a subset of the complete set of vectors {vk} is used, one is interested in 
the best approximation of a general element of the space by an element from the 
subspace spanned by the vectors in the subset. This question has a particularly 
easy answer when the set {v^} is orthonormal and the goal is least-squares approx- 
imation (that is, the norm of the difference is minimized). Because the geometry 
of Hilbert spaces is similar to Euclidean geometry, the solution is the orthogonal 
projection onto the approximation subspace, since this minimizes the distance or 
approximation error. 

In the following, we formally introduce vector spaces and in particular Hilbert 
spaces. We discuss orthogonal and general bases and their properties. We often use 
the finite-dimensional case for intuition and examples. The treatment is not very 
detailed, but sufficient for the remainder of the book. For a thorough treatment, 
we refer the reader to [113]. 

2.2.1 Vector Spaces and Inner Products 

Let us start with a formal definition of a vector space. 

Definition 2.1 

A vector space over the set of complex or real numbers, C or 7Z, is a set of 
vectors, E, together with addition and scalar multiplication, which, for general 
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x, y in E, and a, (3 in C or 7Z, satisfy the following: 

(a) Commutativity: x + y = y + x. 

(b) Associativity: (x + y) + z = x + (y + z), (a(3)x = a((3x). 

(c) Distributivity : a(x + y) = ax + ay, (a + (3)x = ax + /?x. 

(d) Additive identity: there exists in E, such that x + = x, for all x in 

E. 

(e) Additive inverse: for all x in £", there exists a (— x) in E, such that 
x + (— x) = 0. 

(f) Multiplicative identity: 1 • x = x for all x in E. 
Often, x,y in E will be n-tuples or sequences, and then we define 

x + y = (xi,X2,...) + (y 1 ,y 2 ,...) = (xi + yi,x 2 + 2/2, • • •) 

ax = a(xi,X2,...) = (axi,ax2, ■ • •)• 

While the scalars are from C or 7£, the vectors can be arbitrary, and apart from 
n-tuples and infinite sequences, we could also take functions over the real line. 
A subset M of E is a subspace of E if 

(a) For all x and y in M, x + y is in M. 

(b) For all x in M and a in C or 7£, ax is in M. 

Given S C E, the span of 5 is the subspace of E consisting of all linear combinations 
of vectors in S, for example, in finite dimensions, 

f n \ 

span(S') = < y, a i x i \ ai £ C or 7Z, Xi £ S > . 

Vectors xi,...,x n are called linearly independent, if ^17=1 a i x i = is true only 
if cti = 0, for all i. Otherwise, these vectors are linearly dependent. If there 
are infinitely many vectors xi,x 2 , ■ ■ ., they are linearly independent if for each k, 
x±,x 2 , . . . , Xf~ are linearly independent. 

A subset {xi, . . . ,x n } of a vector space E is called a basis for E, when E = 
span(xi, . . . , x n ) and x\,...,x n are linearly independent. Then, we say that E has 
dimension n. E is infinite- dimensional if it contains an infinite linearly independent 
set of vectors. As an example, the space of infinite sequences is spanned by the 
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infinite set {S[n — k]}kez- Since they are linearly independent, the space is infinite- 
dimensional. 

Next, we equip the vector space with an inner product that is a complex function 
fundamental for defining norms and orthogonality. 

Definition 2.2 

An inner product on a vector space E over C (or TV), is a comple- valued 
function (•, •), defined onBxfi with the following properties: 

(a) (x + y,z) = (x,z) + (y,z). 

(b) (x,ay) = a(x,y). 

(c) (x,y)* = (y,x). 

(d) (x,x) > 0, and (x,x) = if and only if x = 0. 

Note that (b) and (c) imply (ax,y) = a*(x,y). From (a) and (b), it is clear 
that the inner product is linear. Note that we choose the definition of the inner 
product which takes the complex conjugate of the first vector (follows from (b)). 
For illustration, the standard inner product for complex-valued functions over TZ 
and sequences over Z are 

/oo 
/*(*) g(t)dt, 
-oo 

and 

oo 

( x ,y) = ^2 X *N y t n ]> 

n=— oo 

respectively (if they exist). The norm of a vector is defined from the inner product 
as 

||x|| = y/(x,x), (2.2.1) 

and the distance between two vectors x and y is simply the norm of their difference 
||x — y\\. Note that other norms can be defined (see (2.2.16)), but since we will only 
use the usual Euclidean or square norm as defined in (2.2.1), we use the symbol 
|| . || without a particular subscript. 

The following hold for inner products over a vector space: 

(a) Cauchy-Schwarz inequality 

\(*,y)\ < IMI llz/ll, (2.2.2) 

with equality if and only if x = ay. 
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(b) Triangle inequality 

\\x + y\\ < \\x\\ + \\y\\, 

with equality if and only if x = ay, where a is a positive real constant. 

(c) Parallelogram law 

||:r + y|| 2 + ||a;-y|| 2 = 2(||x|| 2 + ||y|| 2 ). 

Finally, the inner product can be used to define orthogonality of two vectors x and 
y, that is, vectors x and y are orthogonal if and only if 

(x,y) = 0. 

If two vectors are orthogonal, which is denoted by x _L y, then they satisfy the 
Pythagorean theorem, 

ii n2 ii n2 ii n2 

\\x + y\\ = \\x\\ + \\y\\ , 

since \\x + y\\ 2 = (x + y, x + y) = \\x\\ 2 + (x, y) + (y, x) + \\y\\ 2 . 

A vector x is said to be orthogonal to a set of vectors S = {yi} if (x, yi) = for 
all i. We denote this by x _L S. More generally, two subspaces S\ and S2 are called 
orthogonal if all vectors in Si are orthogonal to all of the vectors in S2 , and this is 
written S± _L S2. A set of vectors {x±,X2, • • •} is called orthogonal if Xi _L Xj when 
i / j. If the vectors are normalized to have unit norm, we have an orthonormal 
system, which therefore satisfies 

(xi,Xj) = 5[i - j}. 

Vectors in an orthonormal system are linearly independent, since ^ ctiXi = implies 
= ( x j> S a i x -i) — S a i{xj,Xi) = ctj. An orthonormal system in a vector space E 
is an orthonormal basis if it spans E. 

2.2.2 Complete Inner Product Spaces 

A vector space equipped with an inner product is called an inner product space. 
One more notion is needed in order to obtain a Hilbert space, completeness. To 
this end, we consider sequences of vectors {x n } in E, which are said to converge to 
x in E if \\x n — x\\ — > as n — > 00. A sequence of vectors {x n } is called a Cauchy 
sequence, if \\x n — x m \\ — > 0, when n, m, — > 00. If every Cauchy sequence in .E, 
converges to a vector in £7, then E is called complete. This leads to the following 
definition: 
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Definition 2.3 

A complete inner product space is called a Hilbert space. 

We are particularly interested in those Hilbert spaces which are separable because a 
Hilbert space contains a countable orthonormal basis if and only if it is separable. 
Since all Hilbert spaces with which we are going to deal are separable, we implicitly 
assume that this property is satisfied (refer to [113] for details on separability). 
Note that a closed subspace of a separable Hilbert space is separable, that is, it also 
contains a countable orthonormal basis. 

Given a Hilbert space E and a subspace S, we call the orthogonal complement 
of S in E, denoted S r_L , the set {x £ E | x _L S}. Assume further that S is closed, 
that is, it contains all limits of sequences of vectors in S. Then, given a vector y in 
E, there exists a unique v in S and a unique w in S 1 - such that y = v + w. We can 
thus write 

E = S®S ± , 

or, E is the direct sum of the subspace and its orthogonal complement. 
Let us consider a few examples of Hilbert spaces. 

Complex/Real Spaces The complex space C n is the set of all n-tuples x = 
(#1, . . . , x n ), with finite x\ in C. The inner product is defined as 



(x,y) = J2x*Vi, 
and the norm is 



i=i 



\/{x,x) 



N§ 



X; 



2 



The above holds for the real space 7Z n as well (note that then y* = j/j). For 

example, vectors e^ = (0, . . . ,0, 1,0, . . . ,0), where 1 is in the ith position, form 

an orthonormal basis both for TV 1 and C n . Note that these are the usual spaces 
considered in linear algebra. 

Space of Square-Sum mable Sequences In discrete-time signal processing we 
will be dealing almost exclusively with sequences x[n] having finite square sum or 
finite energy, 2 where x[n] is, in general, complex-valued and n belongs to Z. Such 
a sequence x[n] is a vector in the Hilbert space h(Z). The inner product is 

oo 

(x,y) = X] x N*yN> 



"In physical systems, the sum or integral of a squared function often corresponds to energy. 
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and the norm is 

||x|| = y/(x,x) = J / y |x[n]| 2 . 

Thus, liiZ} is the space of all sequences such that ||x|| < oo. This is obviously an 
infinite- dimensional space, and a possible orthonormal basis is {S[n — k]\kez- 

For the completeness of h(2), one has to show that if x n [k] is a sequence of 
vectors in li{Z') such that \\x n — x m \\ — > as n, m — > oo (that is, a Cauchy sequence), 
then there exists a limit x in li{£) such that \\x n — x\\ — > 0. The proof can be found, 
for example, in [113]. 

Space of Square-lntegrable Functions A function fit) defined on 11 is said to 
be in the Hilbert space L,2(1Z), if \f(t)\ 2 is integrable, 3 that is, if 



\f{t)\ 2 dt < oo. 
t&TZ 



The inner product on L2(7Z) is given by 



(f,g) = / f(t)*g(t)dt, 



and the norm is 



= V^ifJ) = J \f(t)\ 2 dt. 
V J ten 

This space is infinite-dimensional (for example, e _< , te~ l , t 2 e~ l . . . are linearly 
independent). 

2.2.3 Orthonormal Bases 

Among all possible bases in a Hilbert space, orthonormal bases play a very impor- 
tant role. We start by recalling the standard linear algebra procedure which can be 
used to orthogonalize an arbitrary basis. 

Gram-Schmidt Orthogonalization Given a set of linearly independent vectors 
{xi} in E, we can construct an orthonormal set {yi} with the same span as {xj} as 
follows: Start with 

Xl 

y i = 11 — it 
Fi 



3 Actually, |/| 2 has to be Lebesgue integrable. 
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Then, recursively set 

Vk = 71 [7' A; = 2,3,... 

\\Xk ~ Vk\\ 

where 

fc-i 

Wfe = y^{Vi,x k )yj. 

i=i 

As will be seen shortly, the vector v k is the orthogonal projection of x k onto the 
subspace spanned by the previous orthogonalized vectors and this is subtracted 
from x k , followed by normalization. 

A standard example of such an orthogonalization procedure is the Legendre 
polynomials over the interval [— 1, 1]. Start with Xk(t) = t , k = 0, 1, . . . and apply 
the Gram-Schmidt procedure to get y k (t), of degree k, norm 1 and orthogonal to 
yi(t), i < k (see Problem 2.1). 

Bessel'S Inequality If we have an orthonormal system of vectors {xk} in E, then 
for every y in E the following inequality, known as Bessel's inequality, holds: 



mi 2 > Eifey)i : 



If we have an orthonormal system that is complete in E, then we have an orthonor- 
mal basis for E, and Bessel's relation becomes an equality, often called Parseval's 
equality (see Theorem 2.4). 

Orthonormal Bases For a set of vectors S = {xi} to be an orthonormal basis, 
we first have to check that the set of vectors S is orthonormal and then that 
it is complete, that is, that every vector from the space to be represented can 
be expressed as a linear combination of the vectors from 5. In other words, an 
orthonormal system {x^ is called an orthonormal basis for E, if for every y in E, 

y = ^a k x k . (2.2.3) 

k 

The coefficients a k of the expansion are called the Fourier coefficients of y (with 
respect to {xj}) and are given by 

a k = (x k ,y). (2.2.4) 

This can be shown by using the continuity of the inner product (that is, if x n — > x, 
and y n — > y, then (x n ,y n ) — > (x,y)) as well as the orthogonality of the x k 's. Given 
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that y is expressed as (2.2.3), we can write 

n 

{x k ,y) = Hm (xk, y^aaxi) = a k , 

i=0 

where we used the linearity of the inner product. 

In finite dimensions (that is, TV 1 or C n ), having an orthonormal set of size n 
is sufficient to have an orthonormal basis. As expected, this is more delicate in 
infinite dimensions (that is, it is not sufficient to have an infinite orthonormal set). 
The following theorem gives several equivalent statements which permit us to check 
if an orthonormal system is also a basis: 

Theorem 2.4 

Given an orthonormal system {x\,x 2 , . . .} in E, the following are equivalent: 

(a) The set of vectors {x\, X2, . . .} is an orthonormal basis for E. 

(b) If (xi,y) =0fbr i = 1,2,..., then y = 0. 

(c) span({xi}) is dense in E, that is, every vector in E is a limit of a sequence 
of vectors in span({xj}). 



(d) For every y in E, 



|y|| 2 = £|<z,,y)| 2 , (2.2.5) 



which is called Parseval's equality. 
(e) For every y\ and y 2 in E, 

(yi,2/2) = ^2(x i ,y 1 )*(x i ,y 2 ), (2.2.6) 

i 

which is often called the generalized Parseval's equality. 

For a proof, see [113]. 

Orthogonal Projection and Least-Squares Approximation Often, a vector from 
a Hilbert space E has to be approximated by a vector lying in a (closed) subspace S. 
We assume that E is separable, thus, 5 contains an orthonormal basis {x\, x 2 , ■ ■ ■}■ 
Then, the orthogonal projection of y G E onto S is given by 



y = ^2( x i,y) 



X r . 
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Figure 2.1 Orthogonal projection onto a subspace. Here, y G 1Z and y is its 
projection onto the span of {cci, £2}- Note that y — y is orthogonal to the span 

{X1,X2J- 



<* 2 > y) 





l>y> y= ( X y) 



(b) 



Figure 2.2 Expansion in orthogonal and biorthogonal bases, (a) Orthogonal 
case: The successive approximation property holds, (b) Biorthogonal case: 
The first approximation cannot be used in the full expansion. 



Note that the difference d = y — y satisfies 

d JL S 



and, in particular, d _L y, as well as 

IMI 2 



\y\\ 2 + \\d\\ 2 . 



This is shown pictorially in Figure 2.1. An important property of such an approxi- 
mation is that it is best in the least-squares sense, that is, 

min \\y — x\\ 
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for x in S is attained for x = Yli a i x i with 

oti = (xi,y), 

that is, the Fourier coefficients. An immediate consequence of this result is the 
successive approximation property of orthogonal expansions. Call y^ k ' the best 
approximation of y on the subspace spanned by {x\,X2, ■ ■ ■ ,Xk} and given by the 
coefficients {a>i,a>2, ■ ■ ■ ,a k } where ai = (xi,y). Then, the approximation y( k+l ) is 
given by 

y (k+1) = y (k) + (x k+1 ,y)x k+1 , 

that is, the previous approximation plus the projection along the added vector x^+i. 
While this is obvious, it is worth pointing out that this successive approximation 
property does not hold for nonorthogonal bases. When calculating the approxima- 
tion y^ +1 ', one cannot simply add one term to the previous approximation, but has 
to recalculate the whole approximation (see Figure 2.2). For a further discussion 
of projection operators, see Appendix 2. A. 

2.2.4 General Bases 

While orthonormal bases are very convenient, the more general case of nonorthog- 
onal or biorthogonal bases is important as well. In particular, biorthogonal bases 
will be constructed in Chapters 3 and 4. A system {xi,Xi} constitutes a pair of 
biorthogonal bases of a Hilbert space E if and only if [56, 73] 

(a) For all i,j in Z 

(xuxj) = 6[i-j]. (2.2.7) 

(b) There exist strictly positive constants A, B, A, B such that, for all y in E 

A\\y\\ 2 < ^|(x fc ,y)| 2 < B\\yf, (2.2.8) 

k 

A\\y\\ 2 < ]TK£ fc ,y)| 2 < B\\yf. (2.2.9) 



Compare these inequalities with (2.2.5) in the orthonormal case. Bases which satisfy 
(2.2.8) or (2.2.9) are called Riesz bases [73]. Then, the signal expansion formula 
becomes 

y = ^2(xk,y) Xk = ^2(xk,y) x k . (2.2.10) 

k k 

It is clear why the term biorthogonal is used, since to the (nonorthogonal) basis 
{xi} corresponds a dual basis {xj} which satisfies the biorthogonality constraint 
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(2.2.7). If the basis {xj} is orthogonal, then it is its own dual, and the expansion 
formula (2.2.10) becomes the usual orthogonal expansion given by (2.2.3-2.2.4). 

Equivalences similar to Theorem 2.4 hold in the biorthogonal case as well, and 
we give the Parseval's relations which become 

||y|| 2 = J>«,j/)*<Xi,j/>, ( 2 - 2 - n ) 

i 

and 

(yi,m) = ^(xi,yi)*(x„y 2 ), (2.2.12) 

i 

= ^(x^yinx,,^). (2.2.13) 

i 

For a proof, see [213] and Problem 2.8. 

2.2.5 Overcomplete Expansions 

So far, we have considered signal expansion onto bases, that is, the vectors used 
in the expansion were linearly independent. However, one can also write signals in 
terms of a linear combination of an overcomplete set of vectors, where the vectors 
are not independent anymore. A more detailed treatment of such overcomplete sets 
of vectors, called frames, can be found in Chapter 5 and in [73, 89]. We will only 
discuss a few basic notions here. 

A family of functions {x^} in a Hilbert space H is called a frame if there exist 
two constants A > 0, B < oo, such that for all y in H 

A\\ y \\ 2 < Y.\^y)\ 2 ^ B \\y\\ 2 - 

k 

A, B are called frame bounds, and when they are equal, we call the frame tight. In 
a tight frame we have 

]T|<* fc ,y)| 2 = ^||y|| 2 , 
k 

and the signal can be expanded as follows: 

y = A- l Y J (*k,y)x k - (2.2.14) 

k 

While this last equation resembles the expansion formula in the case of an or- 
thonormal basis, a frame does not constitute an orthonormal basis in general. In 
particular, the vectors may be linearly dependent and thus not form a basis. If all 



2.3. ELEMENTS OF LINEAR ALGEBRA 29 

the vectors in a tight frame have unit norm, then the constant A gives the redun- 
dancy ratio (for example, A = 2 means there are twice as many vectors as needed 
to cover the space). Note that if A = B — 1, and ||xfc|| = 1 for all k, then {xk} 
constitutes an orthonormal basis. 

Because of the linear dependence which exists among the vectors used in the 
expansion, the expansion is not unique anymore. Consider the set {x±,X2, ■ ■ ■} 
where Y2i Pi x i = (where not all /3j's are zero) because of linear dependence. If y 
can be written as 

y = ^a iXi , (2.2.15) 

i 

then one can add /3j to each «j without changing the validity of the expansion 
(2.2.15). The expansion (2.2.14) is unique in the sense that it minimizes the norm 
of the expansion among all valid expansions. Similarly, for general frames, there 
exists a unique dual frame which is discussed in Section 5.3.2 (in the tight frame 
case, the frame and its dual are equal). 

This concludes for now our brief introduction of signal expansions. Later, more 
specific expansions will be discussed, such as Fourier and wavelet expansions. The 
fundamental properties seen above will reappear in more specialized forms (for 
example, Parseval's equality). 

While we have only discussed Hilbert spaces, there are of course many other 

spaces of functions which are of interest. For example, L p (7Z) spaces are those 

containing functions / for which \f\ p is integrable [113]. The norm on these spaces 

is defined as 

/•oo 

= (/ mm 1 '*, (2.2.16) 



which for p = 2 is the usual L2 norm. 4 Two L p spaces which will be useful later are 
Li(7Z), the space of functions f(t) satisfying J^° \f(t)\dt < 00, and L oc (TZ), the 
space of functions /(£) such that sup|/(i)| < 00. Their discrete-time equivalents 
are 1\{Z) (space of sequences x[n] such that ^2 n \ x [ n ]\ < °°) an d ^oo(^) (space of 
sequences x[n] such that sup|x[n]| < 00). Associated with these spaces are the 
corresponding norms. However, many of the intuitive geometric interpretations we 
have seen so far for L2(JZ) and li(Z~) do not hold in these spaces (see Problem 2.3). 
Recall that in the following, since we use mostly L2 and I2, we use || . || to mean 

2.3 Elements of Linear Algebra 

The finite-dimensional cases of Hilbert spaces, namely 7Z n and C n , are very impor- 
tant, and linear operators on such spaces are studied in linear algebra. Many good 



For p y£ 2, the norm || . || p cannot be derived from an inner product as in Definition 2.2. 
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reference texts exist on the subject, see [106, 280]. Good reviews can also be found 
in [150] and [308]. We give only a brief account here, focusing on basic concepts 
and topics which are needed later, such as polynomial matrices. 

2.3.1 Basic Definitions and Properties 

We can view matrices as representations of bounded linear operators (see Ap- 
pendix 2. A). The familiar system of equations 

A n xi + ••• + A ln x n = yi, 



^ml^l T ' ' ' T si mn X n — Vrrn 

can be compactly represented as 

Ax = y. (2.3.1) 

Therefore, any finite matrix, or a rectangular (m rows and n columns) array of 
numbers, can be interpreted as an operator A 

/A n ■■■ A lr 

\ "ml ' ' ' si mn 

An m x 1 matrix is called a column vector, while a 1 x n matrix is a row vector. 
As seen in (2.3.1), we write matrices as bold capital letters, and column vectors 
as lower-case bold letters. A row vector would then be written as v T , where T 
denotes transposition (interchange of rows and columns, that is, if A has elements 
Aij, A has elements Aji). If the entries are complex, one often uses hermitian 
transposition, which is complex conjugation followed by usual transposition, and is 
denoted by a superscript *. 

When m = n, the matrix is called square, otherwise it is called rectangular. A 
lxl matrix is called scalar. We denote by the null matrix (all elements are zero) 
and by / the identity (An = 1, and otherwise). The identity matrix is a special 
case of a diagonal matrix. The antidiagonal matrix J has all the elements on the 
other diagonal equal to 1, while the rest are 0, that is, Aij = 1, for j = n + 1 — i, 
and Aij = otherwise. A lower (or upper) triangular matrix is a square matrix 
with all of its elements above (or below) the main diagonal equal to zero. 

Beside addition/subtraction of same-size matrices (by adding/subtracting the 
corresponding elements), one can multiply matrices A and B with sizes mx n and 
n x p respectively, yielding a matrix C whose elements are given by 

n 

^ij — / j Ajkt>kj. 

fc=l 
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Note that the matrix product is not commutative in general, that is, A B / B A. 5 
It can be shown that (A B) T = B T A T . 

The inner product of two (column) vectors from 7Z is (vi, V2) = vf ■ V2, and if 
the vectors are from C n , then (v\, V2) = v* ■ v-i- The outer product of two vectors 
from 1Z n and 1Z m is an n x m matrix given by v\ • v^. 

To define the notion of a determinant, we first need to define a minor. A minor 
Mij is a submatrix of the matrix A obtained by deleting its ith row and j'th column. 
More generally, a minor can be any submatrix of the matrix A obtained by deleting 
some of its rows and columns. Then the determinant of an n x n matrix can be 
defined recursively as 



det(A) = ^A ti (-l)^det(Mi 



■■ijj 

i=l 

where j is fixed and belongs to {1, . . . ,n}. The cofactor Cij is (— 1)*+ J det(iWjj). 
A square matrix is said to be singular if det(A) = 0. The product of two matrices 
is nonsingular only if both matrices are nonsingular. Some properties of interest 
include the following: 

(a) If C = AB, then det(C) = det(A) det(B). 

(b) If B is obtained by interchanging two rows/columns of A, then det(-B) = 

-det(A). 

(c) det(A T ) = det(A). 

(d) For an n x n matrix A, det(cA) = c n det(A). 

(e) The determinant of a triangular, and in particular, of a diagonal matrix is the 
product of the elements on the main diagonal. 

An important interpretation of the determinant is that it corresponds to the volume 
of the parallelepiped obtained when taking the column vectors of the matrix as its 
edges (one can take the row vectors as well, leading to a different parallelepiped, 
but the volume remains the same). Thus, a zero determinant indicates linear de- 
pendence of the row and column vectors of the matrix, since the parallelepiped is 
not of full dimension. 

The rank of a matrix is the size of its largest nonsingular minor (possibly the 
matrix itself). In a rectangular mxn matrix, the column rank equals the row rank, 
that is, the number of linearly independent rows equals the number of linearly 



5 When there is possible confusion, we will denote a matrix product by A ■ B; otherwise we will 
simply write AB. 
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independent columns. In other words, the dimension of span(columns) is equal to 
the dimension of span(rows). For an n x n matrix to be nonsingular, its rank should 
equal n. Also rank(A_B) < min(rank(A),rank(S)). 

For a square nonsingular matrix A, the inverse matrix A~ can be computed 
using Cramer's formula 

i adjugate(A) 

det(A) ' 

where the elements of adjugate(A) are (adjugate(A))jj = cofactor of Aji = Cji. 
For a square matrix, AA~ l = A' 1 A = I. Also, (AS) -1 = B~ 1 A~ 1 . Note that 
Cramer's formula is not actually used to compute the inverse in practice; rather, it 
serves as a tool in proofs. 

For an m x n rectangular matrix A, an n x m matrix L is its left inverse if 
LA = I. Similarly, an n x m matrix R is a right inverse of A if AR = I. These 
inverses are not unique and may not even exist. However, if the matrix A is square 
and has full rank, then its right inverse equals its left inverse, and we can apply 
Cramer's formula to find that inverse. 

The Kronecker product of two matrices is defined as (we show a 2 x 2 matrix 
as an example) 



a b 
c d 



M 



aM bM 
cM dM 



(2.3.2) 



where a, b, c and d are scalars and M is a matrix (neither matrix need be square). 
See Problem 2.19 for an application of Kronecker products. The Kronecker product 
has the following useful property with respect to the usual matrix product [32]: 

(A®B)(C®D) = (AC)®(BD) (2.3.3) 

where all the matrix products have to be well-defined. 

2.3.2 Linear Systems of Equations and Least Squares 

Going back to the equation A x = y, one can say that the system has a unique 
solution provided A is nonsingular, and this solution is given by x = A~ y. Note 
that one would rarely compute the inverse matrix in order to solve a linear system 
of equations; rather Gaussian elimination would be used, since it is much more 
efficient. In the following, the column space of A denotes the linear span of the 
columns of A, and similarly, the row space is the linear span of the rows of A. 

Let us give an interpretation of solving the problem Ax = y. The product Ax 
constitutes a linear combination of the columns of A weighted by the entries of x. 
Thus, if y belongs to the column space of A, also called the range of A, there will 
be a solution. If the columns are linearly independent, the solution is unique, if 
they are not, there are infinitely many solutions. The null space of A is spanned 
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by the vectors orthogonal to the row space, or Av = 0. If A is of size to x n (the 
system of equations has to equations in n unknowns), then the dimension of the 
range (which equals the rank p) plus the dimension of the null space is equal to 
to. A similar relation holds for row spaces (which are column spaces of A ) and 
the sum is then equal to n. If y is not in the range of A there is no exact solution 
and only approximations are possible, such as the orthogonal projection of y onto 
the span of the columns of A, which results in a least-squares solution. Then, the 
error between y and its projection y (see Figure 2.1) is orthogonal to the column 
space of A. That is, any linear combination of the columns of A, for example Aa, 
is orthogonal to y — y = y — Ax where x is the least-squares solution. Thus 

(Aa) T (y-Ax) = 

or 

A T Ax = A T y, 

which are called the normal equations of the least-squares problem. If the columns 
of A are linearly independent, then A A is invertible. The unique least-squares 
solution is 

x = (A T A)- 1 A T y (2.3.4) 

(recall that A is either rectangular or rank deficient, and does not have a proper 
inverse) and the orthogonal projection y is equal to 



y 



A(A T A)- 1 A T y. (2.3.5) 



Note that the matrix P = A(A A) -1 A satisfies P = P and is symmetric 
P = P , thus satisfying the condition for an orthogonal projection operator (see 
Appendix 2. A). Also, it can be verified that the partial derivatives of the squared 
error with respect to the components of x are zero for the above choice (see Prob- 
lem 2.6). 

2.3.3 Eigenvectors and Eigenvalues 

The characteristic polynomial for a matrix A is D{x) = det(xJ — A), whose roots 
are called eigenvalues Aj. In particular, a vector p / for which 

Ap = Xp, 

is an eigenvector associated with the eigenvalue A. If a matrix of size n x n has 
n linearly independent eigenvectors, then it can be diagonalized, that is, it can be 
written as 

A = TAT 1 , 
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where A is a diagonal matrix containing the eigenvalues of A along the diagonal 
and T contains its eigenvectors as its columns. An important case is when A 
is symmetric or, in the complex case, hermitian symmetric, A* = A. Then, the 
eigenvalues are real, and a full set of orthogonal eigenvectors exists. Taking them as 
columns of a matrix U after normalizing them to have unit norm so that U* -U = I, 
we can write a hermitian symmetric matrix as 

A = UAU*. 

This result constitutes the spectral theorem for hermitian matrices. Hermitian 
symmetric matrices commute with their hermitian transpose. More generally, a 
matrix N that commutes with its hermitian transpose is called normal, that is, it 
satisfies N*N = NN*. Normal matrices are exactly those that have a complete 
set of orthogonal eigenvectors. 

The importance of eigenvectors in the study of linear operators comes from the 
following fact: Assuming a full set of eigenvectors, a vector x can be written as a 
linear combination of eigenvectors x = J^ O-iVi. Then, 

Ax = A\S^aiVi\ = ^ ai(Avi) = \^ aaXiVi. 



A I y^ ajVi J = ^2ai(Avi) = y^ajXj 

\ i / i i 



The concept of eigenvectors generalizes to eigenfunctions for continuous operators, 
which are functions f w (t) such that Af^it) = X(u)fu,(t). A classic example is the 
complex sinusoid, which is an eigenfunction of the convolution operator, as will be 
shown in Section 2.4. 

2.3.4 Unitary Matrices 

We just explained an instance of a square unitary matrix, that is, an m x m matrix 
U which satisfies 

U*U = UU* = I, (2.3.6) 

or, its inverse is its (hermitian) transpose. When the matrix has real entries, it is 
often called orthogonal or orthonormal, and sometimes, a scale factor is allowed on 
the left of (2.3.6). Rectangular unitary matrices are also possible, that is, anmxn 
matrix U with m < n is unitary if 

\\Ux\\ = \\x\\, VxgC", 

as well as 

(Ux,Uy) = (x,y), Vx,yeC n , 
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which are the usual Parseval's relations. Then it follows that 

UU* = I, 

where J is of size m x m (and the product does not commute). Unitary matrices 
have eigenvalues of unit modulus and a complete set of orthogonal eigenvectors. 
Note that a unitary matrix performs a rotation, thus, the 1% norm is preserved. 

When a square m x m matrix A has full rank its columns (or rows) form a 
basis for 7Z m and we recall that the Gram-Schmidt orthogonalization procedure 
can be used to get an orthogonal basis. Gathering the steps of the Gram-Schmidt 
procedure into a matrix form, we can write A as 

A = QR, 

where the columns of Q form the orthonormal basis and Ft is upper triangular. 

Unitary matrices form an important but restricted class of matrices, which can 
be parametrized in various forms. For example, an n x n real orthogonal matrix 
has n{n — l)/2 degrees of freedom (up to a permutation of its rows or columns and 
a sign change in each vector). If we want to find an orthonormal basis for lZ n , 
start with an arbitrary vector and normalize it to have unit norm. This gives n—1 
degrees of freedom. Next, choose a norm-1 vector in the orthogonal complement 
with respect to the first vector, which is of dimension n—1, giving another n — 2 
degrees of freedom. Iterate until the nth vector is chosen, which is unique up to a 
sign. We have Y27=o * = n ( n ~~ 0/2 degrees of freedom. These degrees of freedom 
can be used in various parametrizations, based either on planar or Givens rotations 
or, on Householder building blocks (see Appendix 2.B). 



2.3.5 Special Matrices 

A (right) circulant matrix is a matrix where each row is obtained by a (right) 
circular shift of the previous row, or 
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A Toeplitz matrix is a matrix whose (i, j)th entry depends only on the value of i — j 
and thus it is constant along the diagonals, or 
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Sometimes, the elements ti are matrices themselves, in which case the matrix is 
called block Toeplitz. Another important matrix is the DFT (Discrete Fourier 
Transform) matrix. The (i, k)th element of the DFT matrix of size n x n is 
W^ k = e -J 2mk / n . The DFT matrix diagonalizes circulant matrices, that is, its 
columns and rows are the eigenvectors of circulant matrices (see Section 2.4.8 and 
Problem 2.18). 

A real symmetric matrix A is called positive definite if all its eigenvalues are 
greater than 0. Equivalently, for all nonzero vectors x, the following is satisfied: 

x T Ax > 0. 

Finally, for a positive definite matrix A, there exists a nonsingular matrix W such 
that 

A = W T W, 

where W is intuitively a "square root" of A. One possible way to choose such a 
square root is to diagonalize A as A = QAQ and then, since all the eigenvalues 
are positive, choose W = QvA (the square root is applied on each eigenvalue in 
the diagonal matrix A). The above discussion carries over to hermitian symmetric 
matrices by using hermitian transposes. 

2.3.6 Polynomial Matrices 

Since a fair amount of the results given in Chapter 3 will make use of polynomial 
matrices, we will present a brief overview of this subject. For more details, the 
reader is referred to [106], while self-contained presentations on polynomial matrices 
can be found in [150, 308]. 

A polynomial matrix (or a matrix polynomial) is a matrix whose entries are 
polynomials. The fact that the above two names can be used interchangeably is 
due to the following forms of a polynomial matrix H{x): 

/ J2 a i x% ■■■ E hx l \ 

H(x) = : .. : = J2 H * xi > 

\£c^ ••• Y.diX 1 ) i 

that is, it can be written either as a matrix containing polynomials as its entries, 

or a polynomial having matrices as its coefficients. 

The question of the rank in polynomial matrices is more subtle. For example, 

the matrix 

a + bx 3(a + bx) 

c + dx A(c + dx) 

with A = 3, always has rank less than 2, since the two columns are proportional 
to each other. On the other hand, if A = 2, then the matrix would have the rank 
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less than 2 only if x = —a/b or x = —c/d. This leads to the notion of normal rank. 
First, note that H{x) is nonsingular only if det{H (x)) is different from for some 
x. Then, the normal rank of H{x) is the largest of the orders of minors that have 
a determinant not identically zero. In the above example, for A = 3, the normal 
rank is 1, while for A = 2, the normal rank is 2. 

An important class of polynomial matrices are unimodular matrices, whose de- 
terminant is not a function of x. An example is the following matrix: 

H(x) = (l + X * 
K ' \2 + x 1 + x 

whose determinant is equal to 1. There are several useful properties pertaining 
to unimodular matrices. For example, the product of two unimodular matrices 
is again unimodular. The inverse of a unimodular matrix is unimodular as well. 
Also, one can prove that a polynomial matrix H{x) is unimodular, if and only if 
its inverse is a polynomial matrix. All these facts can be proven using properties 
of determinants (see, for example, [308]). 

The extension of the concept of unitary matrices to polynomial matrices leads 
to paraunitary matrices [308] as studied in circuit theory. In fact, these matrices 
are unitary on the unit circle or the imaginary axis, depending if they correspond 
to discrete-time or continuous-time linear operators (z-transforms or Laplace trans- 
forms). Consider the discrete-time case and x = e JU> . Then, a square matrix U{x) 
is unitary on the unit circle if 

[U{e juJ )}*U{e ju ) = U(e> u )[U(e> u )]* = I. 

Extending this beyond the unit circle leads to 

[C/(x- 1 )] T C/(x) = U{x)[U{x- l )f = I, (2.3.7) 

since (e 3 ^)* = e~ Ju; . If the coefficients of the polynomials are complex, the coeffi- 
cients need to be conjugated in (2.3.7), which is usually written [[7*(x _1 )] T . This 
will be studied in Chapter 3. 

As a generalization of polynomial matrices, one can consider the case of rational 
matrices. In that case, each entry is a ratio of two polynomials. As will be shown 
in Chapter 3, polynomial matrices in z correspond to finite impulse response (FIR) 
discrete-time filters, while rational matrices can be associated with infinite impulse 
response (IIR) filters. Unimodular and unitary matrices can be defined in the 
rational case, as in the polynomial case. 

2.4 Fourier Theory and Sampling 

This section reviews the Fourier transform and its variations when signals have 
particular properties (such as periodicity). Sampling, which establishes the link be- 
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tween continuous- and discrete-time signal processing, is discussed in detail. Then, 
discrete versions of the Fourier transform are examined. The recurring theme is 
that complex exponentials form an orthonormal basis on which many classes of 
signals can be expanded. Also, such complex exponentials are eigenfunctions of 
convolution operators, leading to convolution theorems. The material in this sec- 
tion can be found in many sources, and we refer to [37, 91, 108, 215, 326] for details 
and proofs. 

2.4.1 Signal Expansions and Nomenclature 

Let us start by discussing some naming conventions. First, the signal to be ex- 
panded is either continuous or discrete in time. Then, the expansion involves an 
integral (a transform) or a summation (a series). This leads to four possible com- 
binations of continuous/discrete time and integral/series expansions. Note that in 
the integral case, strictly speaking, we do not have an expansion, but a transform. 
We use lower case and capital letters for the signal and its expansion (or transform) 
and denote by ip u and ipi a continuous and discrete set of basis functions. In gen- 
eral, there is a basis {ip} and its dual {ip}, which are equal in the orthogonal case. 
Thus, we have 

(a) Continuous-time integral expansion, or transform 

x(t) = Xuipu(t)duj with X w = (V^ (£),#(£)). 



(b) Continuous-time series expansion 

x(t) = J2 Xitfj^t) with X t = $i(t),x(t)). 

i 

(c) Discrete-time integral expansion 

x[n] = Xu'ipulrildu with. X u = {%j) w [n],x[n}). 



(d) Discrete-time series expansion 

x[n] = y^Xjipiln] with Xj = (tpi[n},x[n\). 

■i 

In the classic Fourier cases, this leads to 
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(a) The continuous-time Fourier transform (CTFT), often simply called the Fourier 
transform. 

(b) The continuous-time Fourier series (CTFS), or simply Fourier series. 

(c) The discrete-time Fourier transform (DTFT). 

(d) The discrete-time Fourier series (DTFS). 

In all the Fourier cases, {ip} = {tp}- The above transforms and series will be 
discussed in this section. Later, more general expansions will be introduced, in par- 
ticular, series expansions of discrete-time signals using filter banks in Chapter 3, 
series expansions of continuous-time signals using wavelets in Chapter 4, and in- 
tegral expansions of continuous-time signals using wavelets and short-time Fourier 
bases in Chapter 5. 

2.4.2 Fourier Transform 

Given an absolutely integrable function f(t), its Fourier transform is defined by 

/■oo 

F(u) = / f(t)e~**dt = (e* 1 *, /(«)), (2.4.1) 



which is called the Fourier analysis formula. The inverse Fourier transform is given 
by 

f(t) = — / F^e^ckv, (2.4.2) 

2tt J.oo 

or, the Fourier synthesis formula. Note that e?^ 1 is not in L2(1Z), and that the set 
{e- 7 ^'} is not countable. The exact conditions under which (2.4.2) is the inverse 
of (2.4.1) depend on the behavior of f(t) and are discussed in standard texts on 
Fourier theory [46, 326]. For example, the inversion is exact if f(t) is continuous 
(or if f(t) is defined as (f(t + ) + f(t~))/2 at a point of discontinuity). 6 

When f{t) is square-integrable, then the formulas above hold in the L<i sense 
(see Appendix 2.C), that is, calling fit) the result of the analysis followed by the 
synthesis formula, 

||/(t)-/(t)|| = 0. 

Assuming that the Fourier transform and its inverse exist, we will denote by 

fit) — F(u) 



6 We assume that f(t) is of bounded variation. That is, for f(t) denned on a closed interval [a, b], 
there exists a constant A such that X^ n =i !/(*«) — /(*n-i)| < -A for any finite set {ti} satisfying 
a < to < t\ < . . . < ijv < b. Roughly speaking, the graph of f(t) cannot oscillate over an infinite 
distance as t goes over a finite interval. 
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a Fourier transform pair. The Fourier transform satisfies a number of properties, 
some of which we briefly review below. For proofs, see [215]. 

Linearity Since the Fourier transform is an inner product (see (2.4.1)), it follows 
immediately from the linearity of the inner product that 

txf(t) + Pg(t) < — ► aF(u) + (3G(u). 

Symmetry If F(u) is the Fourier transform of f(t), then 

F(t) < — > 2irf(-u), (2.4.3) 

which indicates the essential symmetry of the Fourier analysis and synthesis formu- 
las. 

Shifting A shift in time by to results in multiplication by a phase factor in the 
Fourier domain, 

f(t-to) <— > e-**°F(u). (2.4.4) 

Conversely, a shift in frequency results in a phase factor, or modulation by a complex 
exponential, in the time domain, 

e*"°*/(*) <— > F(u-uo). 

Scaling Scaling in time results in inverse scaling in frequency as given by the 
following transform pair (a is a real constant): 

f(at) <— > A F (-)- ( 2A5 ) 

\a\ \a/ 

Differentiation/Integration Derivatives in time lead to multiplication by (ju) in 

frequency, 

d n f(t) 

_[ii ^ (ju) n F(u), (2.4.6) 

if the transform actually exists. Conversely, if F(0) = 0, we have 



Differentiation in frequency leads to 

d n F(uj) 



(-jt) n f(t) 



duj n 
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m. 



Moments Calling m n the nth moment of f(t), 

t n f(t)dt, n = 0,1,2,..., 
the moment theorem of the Fourier transform states that 

n = 0,l,2,.... 



. ., n 8 n F(u) 



&j r 



(2.4.7) 



(2.4.* 



Convolution The convolution of two functions f(t) and <?(t) is given by 

/•OO 

M<) = / /(r)ff(i - r)dr, (2.4.9) 



and is denoted hit) = fit) * g(t) = g(t) * f(t) since (2.4.9) is symmetric in f(t) 
and g(t). Denoting by F(u) and G(u) the Fourier transforms of f(t) and git), 
respectively, the convolution theorem states that 



f(t)*g(t) 



F(u) G(u). 



This result is fundamental, and we will prove it for f(t) and git) being in Li(lZ). 
Taking the Fourier transform of f(t) * git), 



f(T)g(t - r)dr 



- jujt dt, 



changing the order of integration (which is allowed when f(t) and g(t) are in Li(7Z); 
see Fubini's theorem in [73, 250]) and using the shift property, we get 



f(r) 



g(t - T)e-i ut dt 



d,T 



f{T)e- ]UT G{u)dr = F(u) G{lo). 



The result holds as well when f(t) and g(t) are square-integrable, but requires a 
different proof [108]. 

An alternative view of the convolution theorem is to identify the complex ex- 
ponentials e- 7 ^' as the eigenfunctions of the convolution operator, since 



Ju{t-r) g ^ dT = e j 



LOt 



- jU}T g{r)dT = e jojt G(u). 



The associated eigenvalue G(u>) is simply the Fourier transform of the impulse 
response g(r) at frequency u. 
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By symmetry, the product of time-domain functions leads to the convolution of 
their Fourier transforms, 

f(t)g(t) <— > ^-F(u)*G(u). (2.4.10) 

This is known as the modulation theorem of the Fourier transform. 

As an application of both the convolution theorem and the derivative property, 
consider taking the derivative of a convolution, 

m , WWW], 

The Fourier transform of h'(t), following (2.4.6), is equal to 

ju(F(co)G(u)) = (jujF(uj)) G(u) = F(u) (juG(u)) , 

that is, 

h'(t) = f'(t)*g(t) = f(t)*g'(t). 

This is useful when convolving a signal with a filter which is known to be the 
derivative of a given function such as a Gaussian, since one can think of the result 
as being the convolution of the derivative of the signal with a Gaussian. 

Parseval'S Formula Because the Fourier transform is an orthogonal transform, 
it satisfies an energy conservation relation known as Parseval's formula. See also 
Section 2.2.3 where we proved Parseval's formula for orthonormal bases. Here, 
we need a different proof because the Fourier transform does not correspond to an 
orthonormal basis expansion (first, exponentials are not in L2(JZ) and also the com- 
plex exponentials are uncountable, whereas we considered countable orthonormal 
bases [113]). The general form of Parseval's formula for the Fourier transform is 
given by 

i -I /'CO 

f*(t) g(t) dt = — I F*(u) G(u) dco, (2.4.11) 

3 2tt J_ 00 

which reduces, when g(t) = f(t), to 

/oo -I roo 

|/( t )|2 dt = — \F(lu)\ 2 dco. (2.4.12) 

-oo ^ 7l J —oo 

Note that the factor 1/27T comes from our definition of the Fourier transform (2.4.1- 
2.4.2). A symmetric definition, with a factor 1/V27T in both the analysis and 
synthesis formulas (see, for example, [73]), would remove the scale factor in (2.4.12). 
The proof of (2.4.11) uses the fact that 

/*(*) — F*(-u) 
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and the frequency- domain convolution relation (2.4.10). That is, since /*(£) • g(t) 
has Fourier transform (1/2tt)(F*(— lo) * G(u>)), we have 

/OO -I /»OC 

/*(*) (/(«) e - *"* dt = — F*(-Q) G(uj - tt) dSl, 

-oo ^ J —OO 

where (2.4.11) follows by setting lo = 0. 

2.4.3 Fourier Series 

A periodic function f(t) with period T, 

f(t + T) = f(t), 

can be expressed as a linear combination of complex exponentials with frequencies 
nujQ where loq = 2ir/T. In other words, 

oo 

/(*) = ^ F[k\e jku)at , (2.4.13) 



i r T i 2 

F[k] = - / f(t) e- lkuot dt. (2.4.14) 

T J -Til 



with 

1 f T ' 2 

-T/2 

If fit) is continuous, then the series converges uniformly to fit). If a period of 
fit) is square-integrable but not necessarily continuous, then the series converges 
to fit) in the L<i sense; that is, calling fwiti) the truncated series with k going from 
— N to N, the error ||/(t) — /;v(£)|| goes to zero as N — > oo. At points of discon- 
tinuity, the infinite sum (2.4.13) equals the average (/(t + ) + f(t~))/2. However, 
convergence is not uniform anymore but plagued by the Gibbs phenomenon. That 
is, /jv(£) will overshoot or undershoot near the point of discontinuity. The amount 
of over/undershooting is independent of the number of terms iV used in the approx- 
imation. Only the width diminishes as iV is increased. 7 For further discussions on 
the convergence of Fourier series, see Appendix 2.C and [46, 326]. 

Of course, underlying the Fourier series construction is the fact that the set of 
functions used in the expansion (2.4.13) is a complete orthonormal system for the 
interval [-T/2, T/2] (up to a scale factor). That is, defining tp k (t) = il/VT) e jkuj(,t 
for t in [—T/2, T/2] and k in Z, we can verify that 

(<Pk(t),<Pi(t))<_T t, = 6[k-l]. 

I 2 ' 2 J 



Again, we consider nonpathological functions (that is, of bounded variation) 
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When k = I, the inner product equals 1. If k ^ I, we have 



1 
f 



T/2 
-T/2 



e m-w d t 



-nil - k) 



sin(-7r(/ — k)) 



0. 



That the set {ifk} is complete is shown in [326] and means that there exists no 
periodic function f(t) with L2 norm greater than zero that has all its Fourier series 
coefficients equal to zero. Actually, there is equivalence between norms, as shown 
below. 

Parseval'S Relation With the Fourier series coefficients as defined in (2.4.14), 
and the inner product of periodic functions taken over one period, we have 

(/(i),#)[-n, = T(F[k],G[k]), 

1 2 > 2 J 

where the factor T is due to the normalization chosen in (2.4.13-2.4.14). In partic- 
ular, for g(t) = f(t), 



ll/(*)ll; 



" 2 ' 2 J 



T\\F\ 



This is an example of Theorem 2.4, up to the scaling factor T. 

Best Approximation Property While the following result is true in a more gen- 
eral setting (see Section 2.2.3), it is sufficiently important to be restated for Fourier 
series, namely 



N 



fit)- Y. (<Pk,f)v>k(t) 



-N 



N 



- K^ ~ 5Z a kfk{t) 



-N 



where {a^} is an arbitrary set of coefficients. That is, the Fourier series coefficients 
are the best ones for an approximation in the span of {<£>&(£)}, k = —N, . . . , N. 
Moreover, if N is increased, new coefficients are added without affecting the previous 
ones. 

Fourier series, beside their obvious use for characterizing periodic signals, are 
useful for problems of finite size through periodization. The immediate concern, 
however, is the introduction of a discontinuity at the boundary, since periodization 
of a continuous signal on an interval results, in general, in a discontinuous periodic 
signal. 

Fourier series can be related to the Fourier transform seen earlier by using 
sequences of Dirac functions which are also used in sampling. We will turn our 
attention to these functions next. 
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2.4.4 Dirac Function, Impulse Trains and Poisson Sum Formula 

The Dirac function [215], which is a generalized function or distribution, is defined 
as a limit of rectangular functions. For example, if 

Se(t) = i l/ " °f * < £ ' (2.4.15) 

[ L) otherwise, 

then 5(t) = lim e ^oS e (t). More generally, one can use any smooth function tp(t) 
with integral 1 and define [278] 

S(t) = lim-V'l - 
e->o e \e 

Any operation involving a Dirac function requires a limiting operation. Since we are 
reviewing standard results, and for notational convenience, we will skip the limiting 
process. However, let us emphasize that Dirac functions have to be handled with 
care in order to get meaningful results. When in doubt, it is best to go back to the 
definition and the limiting process. For details see, for example, [215]. It follows 
from (2.4.15) that 

S(t) dt = 1, (2.4.16) 



as well as 

f(t-t )S(t)dt = f(t)5(t-t )dt = f(t ). (2.4.17) 

) J — oo 

Actually, the preceding two relations can be used as an alternative definition of 
the Dirac function. That is, the Dirac function is a linear operator over a class of 
functions satisfying (2.4.16-2.4.17). From the above, it follows that 

f(t)*S(t-t ) = f(t-to). (2.4.18) 

One more standard relation useful for the Dirac function is [215] 

fit) S(t) = /(0) 5(t). 

The Fourier transform of 5{t — to) is, from (2.4.1) and (2.4.17), equal to 

S(t-t ) < — ► e- jwt °. 

Using the symmetry property (2.4.3) and the previous results, we see that 

e ju a t < y 2tt5(lu-lu ). (2.4.19) 



3 Note that this holds only for points of continuity. 
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According to the above and using the modulation theorem (2.4.10), f(t) e^ * has 
Fourier transform F(u — loq). 

Next, we introduce the train of Dirac functions spaced T > apart, denoted 
sx(t) and given by 

oo 

s T (t) = £ S(t-nT). (2.4.20) 

n=— oo 

Before getting its Fourier transform, we derive the Poisson sum formula. Note that, 
given a function f(t) and using (2.4.18), 



/oo °° 
f(T)s T (t-T)dr = Y, f(t- nT )- 
-oo „ 



(2.4.21) 



Call the above T-periodic function fo(t). Further assume that f(t) is sufficiently 
smooth and decaying rapidly such that the above series converges uniformly to 
fo(t). We can then expand fo(t) into a uniformly convergent Fourier series 



h(t) = f2 



1 r T / 2 

1 fo{r)e-^ kT ' T dr 



T J-T/2 



e J2irkt/T _ 



Consider the Fourier series coefficient in the above formula, using the expression 
for /o(£) in (2.4.21) 

r T/2 °° r (2n+l)T/2 

/ h{r)e-^ kT / T dr = V / f(r) e~^ kT l T dr 

J-T/2 n= _ 00 J(2n-l)T/2 

This leads to the Poisson sum formula. 

THEOREM 2.5 Poisson Sum Formula 

For a function f(t) with sufficient smoothness and decay, 

£ f(t - nT) = I £ F (^) e^/ T . (2.4.22) 

n=-oo fc=-oo 

In particular, taking T = 1 and t = 0, 

oo oo 

£ f(n) = £ F(2^fc). 



n=-oo fc=-oo 
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One can use the Poisson formula to derive the Fourier transform of the impulse 
train sx(t) in (2.4.20). It can be shown that 

5tM = — Y. 8(»-— )• ( 2 - 4 - 23 ) 

fc=— oo 

We have explained that sampling the spectrum and periodizing the time-domain 
function are equivalent. We will see the dual situation, when sampling the time- 
domain function leads to a periodized spectrum. This is also an immediate appli- 
cation of the Poisson formula. 

2.4.5 Sampling 

The process of sampling is central to discrete-time signal processing, since it pro- 
vides the link with the continuous-time domain. Call fr(t) the sampled version of 
f(t), obtained as 

oo 

f T (t) = f(t) s T (t) = J2 f(nT)S(t-nT). (2.4.24) 



Using the modulation theorem of the Fourier transform (2.4.10) and the transform 
of sx(t) given in (2.4.23), we get 

1 °° / 2tt\ 1 °° / 2n\ 

F T (u) = F(u)*= J2 Slu-k-^) = ■= J2 F(u-k—), (2.4.25) 



T ^ V T I T ^ V T 

k=— oo k=— oo 

where we used (2.4.18). Thus, Ft(uj) is periodic with period 2tt/T, and is obtained 
by overlapping copies of F(u) at every multiple of 2n/T. Another way to prove 
(2.4.25) is to use the Poisson formula. Taking the Fourier transform of (2.4.24) 
results in 

oo 

F T (u) = J2 /( nT ) e " inTw > 

n=— oo 

since frit) is a weighted sequence of Dirac functions with weights f(nT) and shifts 
of nT. To use the Poisson formula, consider the function gn(t) = f(t) e _J , which 
has Fourier transform Gq(lo) = F(u> + Q) according to (2.4.19). Now, applying 
(2.4.22) to ga(t), we find 



oo _. oo 

J2 9n(nT) = - £ G n 



T ^ V T 

k=—oo 



2irk 
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or changing £1 to u> and switching the sign of k, 

£ f{nT) e~^ = I ^ F L - ^ , (2.4.26) 



-00 fe= 



-CO 



which is the desired result (2.4.25). 

Equation (2.4.25) leads immediately to the famous sampling theorem of Whit- 
taker, Kotelnikov and Shannon. If the sampling frequency uj s = 2tt/T s is larger 
than 2w m (where F(u) is bandlimited 9 to u> m ), then we can extract one instance 
of the spectrum without overlap. If this were not true, then, for example for k = 
and k = 1, F(uj) and F(u — 2tv/T) would overlap and reconstruction would not be 
possible. 

THEOREM 2.6 Sampling Theorem 

If f(t) is continuous and bandlimited to u m , then f(t) is uniquely defined 
by its samples taken at twice uj m or f(nn/uj m ). The minimum sampling 
frequency is lu s = 2ui m and T = 7r/u; m is the maximum sampling period. 
Then /(£) can be recovered by the interpolation formula 

oo 

fit) = Y] f(nT)smc T (t-nT), (2.4.27) 

sin (irt/T) 



where 



sincr(t) 



TXt/T 



Note that smcrinT) = 5[n], that is, it has the interpolation property since it is 1 
at the origin but at nonzero multiples of T. It follows immediately that (2.4.27) 
holds at the sampling instants t = nT. 



Proof 



The proof that (2.4.27) is valid for all t goes as follows: Consider the sampled version of 
f(t), fr(t), consisting of weighted Dirac functions (2.4.24). We showed that its Fourier 
transform is given by (2.4.25). The sampling frequency Lu a equals 2uj m , where u> m is the 
bandlimiting frequency of F (w). Thus, F(cu — kco s ) and F(u> — lcu s ) do not overlap for k ^ I. 
To recover F(cu), it suffices to keep the term with k — in (2.4.25) and normalize it by 
T. This is accomplished with a function that has a Fourier transform which is equal to T 
from — u! m to u m and elsewhere. This is called an ideal lowpass filter. Its time-domain 
impulse response, denoted suict(£) where T — n/cu m , is equal to (taking the inverse Fourier 
transform) 

sinCT(i) = J_ p Te-^cko = JLU**/r_ e -W\ = Bin(,rt/r) 

2tt ]_ lo 2Tijt I J nt/T y ' 



9 We will say that a function f(i) is bandlimited to u) m if its Fourier transform F(u>) — for 
|o»| > Ulm- 
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Convolving frit) with suict(£) niters out the repeated spectrums (terms with k 7^ in 
(2.4.25)) and recovers f(t), as is clear in frequency domain. Because frit) is a sequence 
of Dirac functions of weights f(nT), the convolution results in a weighted sum of shifted 
impulse responses, 



J2 f(nT)5(t-nT) 



: suict(£) = / f( n T) sincx {t — nT), 



proving (2.4.27) 



An alternative interpretation of the sampling theorem is as a series expansion on 
an orthonormal basis for bandlimited signals. Define 

tp n , T (t) = -= sinc T (i-nT), (2.4.29) 

VI 



whose Fourier transform magnitude is vT from — uj m to u> m , and otherwise. One 
can verify that (p n ,T(t) form an orthonormal set using Parseval's relation. The 
Fourier transform of (2.4.29) is (from (2.4.28) and the shift property (2.4.4)) 

[ otherwise, 

where T = -n/LO m . From (2.4.11), we find 

(Vn,T,<Pk,T) = ^— / ^ w ("-*W« m duJ = S [n-k]. 

Now, assume a bandlimited signal /(£) and consider the inner product (tp n ,T,f)- 
Again using Parseval's relation, 



(<p n ,T,f) = ^~ / e^ nT F(u)du = Vff(nT), 

because the integral is recognized as the inverse Fourier transform of F{lo) at t = 

nT (the bounds [— LU m ,u> m ] do not alter the computation of F(u>) because it is 

bandlimited to ui m ). Therefore, another way to write the interpolation formula 

(2.4.27) is 

00 

f(t) = J2 {<Pn,TJ) <Pn,T(t) ( 2 ' 4 - 30 ) 

n= — 00 

(the only change is that we normalized the sine basis functions to have unit norm). 

What happens if fit) is not bandlimited? Because {y n ,T} is an orthogonal set, 

the interpolation formula (2.4.30) represents the orthogonal projection of the input 
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signal onto the subspace of bandlimited signals. Another way to write the inner 
product in (2.4.30) is 

/■OO 

(<Pn,T,f) = / Vo,T(r-nT) f(r)dr = <p , T (-t) * f(t)\ 



t=nT, 

which equals ipo,T(t)*f(t) since y?o,r(i) is re& l an< i symmetric in t. That is, the inner 
products, or coefficients, in the interpolation formula are simply the outputs of an 
ideal lowpass filter with cutoff ir/T sampled at multiples of T. This is the usual 
view of the sampling theorem as a bandlimiting convolution followed by sampling 
and reinterpolation. 

To conclude this section, we will demonstrate a fact that will be used in Chap- 
ter 4. It states that the following can be seen as a Fourier transform pair: 

</(«), f(t + n)) = S[n] <— > ]T|i^ + 2fc7r)| 2 = 1. (2.4.31) 

The left side of the equation is simply the deterministic autocorrelation 10 of f(t) 
evaluated at integers, that is, sampled autocorrelation. If we denote the auto- 
correlation of f(t) as p(t) = (f(t),f(t + t)), then the left side of (2.4.31) is 
Pi(t) = p(t)si(t), where Si(r) is as defined in (2.4.20) with T = 1. The Fourier 
transform of pi(r) is (apply (2.4.25)) 

P^Uj) = ^P(uJ-2kTl). 

k£Z 

Since the Fourier transform of p(t) is P{ui) = \F(u>)\ 2 , we get that the Fourier 
transform of the right side of (2.4.31) is the left side of (2.4.31). 

2.4.6 Discrete-Time Fourier Transform 

Given a sequence {/[n]} ng ^, its discrete-time Fourier transform (DTFT) is defined 
by 

oo 

Hen = E /n e ~ Jujn > ( 2 - 4 - 32 ) 

n=— oo 

which is 27T-periodic. Its inverse is given by 

f[n] = — [" F{e juJ ) e jujn dco. (2.4.33) 

A sufficient condition for the convergence of (2.4.32) is that the sequence f[n] be 
absolutely summable. Then, convergence is uniform to a continuous function of to 



°The deterministic autocorrelation of a real function f(t) is f(t) * f(—t) — J /(r) /(r + t) dr. 
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[211]. If the sequence is square-summable, then we have mean square convergence of 
the series in (2.4.32) (that is, the energy of the error goes to zero as the summation 
limits go to infinity). By using distributions, one can define discrete-time transforms 
of more general sequences as well, for example [211] 

oo 
e ju, n i > 2ir Y, S(LU-LU + 2Trk). 

k=—oo 

Comparing (2.4.32-2.4.33) with the equivalent expressions for Fourier series (2.4.13- 
2.4.14), one can see that they are duals of each other (within scale factors). Fur- 
thermore, if the sequence f[n] is obtained by sampling a continuous-time function 
f(t) at instants nT, 

f[n] = f(nT), (2.4.34) 

then the discrete-time Fourier transform is related to the Fourier transform of f(t). 
Denoting the latter by F c (u>), the Fourier transform of its sampled version is equal 
to (see (2.4.26)) 

F T {u) = £ f(nT)e-i nT " = - £ F C L-A;-J. (2.4.35) 

n=— oo k=— oo 

Now consider (2.4.32) at wT and use (2.4.34), thus 

oo 

F(e juT ) = Yl f( nT ) e ~ JnuT 

n= — oo 

and, using (2.4.35), 

F(ei« T ) = 1 f; F c L - k^) . (2.4.36) 

Because of these close relationships with the Fourier transform and Fourier series, 
it follows that all properties seen earlier carry over and we will only repeat two of 
the most important ones (for others, see [211]). 

Convolution Given two sequences f[n] and g[n] and their discrete-time Fourier 
transforms F(e 3UJ ) and G(e JU ), then 



f[n]*g[n] = ^ f[n-l]g[l] = J2 fWAn-l] — F{e?») G(e 3 



/L-J .VL" -J -V -•-"'-- 

l=— oo l=— oo 
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Parseval'S Equality With the same notations as above, we have 

E />] 9[n] = ^ f F*(en G(eP u ) du>, (2.4.37) 

n=— oo — 7r 

and in particular, when <?[n] = f[n], 

e i/ni 2 = ^ r i^(^)i 2 dw. 

n=— oo 7r 

2.4.7 Discrete-Time Fourier Series 

If a discrete-time sequence is periodic with period N, that is, f[n] = f[n + IN], 
I £ Z, then its discrete-time Fourier series representation is given by 

iV-l 

F[k] = E /W W ^' fc e 2 ' ( 2 - 4 - 38 ) 

n=0 
JV-1 



TV 

fc=0 



where Wn is the jVth root of unity. That this is an analysis-synthesis pair is easily 
verified by using the orthogonality of the roots of unity (see (2.1.3)). Again, all the 
familiar properties of Fourier transforms hold, taking periodicity into account. For 
example, convolution is now periodic convolution, that is, 



JV-1 JV-1 



f[n]*g[n] = J2 f[n - l\ g[l] = E M n ~ mod ^ 9o[l], (2-4.40) 

1=0 1=0 

where /o[-] and go[-] are equal to one period of /[•] and g[-] respectively. That is, 
fo[n] = f[n], n = 0, . . . , N — 1, and otherwise, and similarly for go[n]. Then, the 
convolution property is given by 

f[n]*g[n] = fo[n] * p g [n] <— > F[k] G[k], (2.4.41) 

where * p denotes periodic convolution. Parseval's formula then follows as 

JV-1 „ JV-1 



EfHsH = lY, F *W G W- 



N 

n=0 k=0 
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Just as the Fourier series coefficients were related to the Fourier transform of one 
period (see (2.4.14)), the coefficients of the discrete-time Fourier series can be ob- 
tained from the discrete-time Fourier transform of one period. If we call Fq^^) 
the discrete-time Fourier transform of _/b[ n ]> (2.4.32) and (2.4.38) imply that 



N-l 



Men = E /oN e-*"" = E /I' 



n\ e~ Ju;n , 



n=— oo n=0 

leading to 

F[k] = F (e? u )\ u=k2 */N- 

The sampling of i*b(&"*') simply repeats copies of fo[n] at integer multiples of N, 
and thus we have 



e Jk2ir/N 



jnk2n/N 



oo N-l N-l 

fin] = E /o[n-^ = ^ E F[k] J****'" = ± ^ F 

l=-oo fc=0 fc=0 

(2.4.42) 
which is the discrete-time version of the Poisson sum formula. It actually holds 
for /o[-] with support larger than 0, . . . ,N — 1, as long as the first sum in (2.4.42) 
converges. For n = 0, (2.4.42) yields 



oo JV-1 

E foiiN] = - E ^o 

l=— oo k=0 



e Jk2Tr/N 



2.4.8 Discrete Fourier Transform 

The importance of the discrete-time Fourier transform of a finite-length sequence 
(which can be one period of a periodic sequence) leads to the definition of the 
discrete Fourier transform (DFT). This transform is very important for computa- 
tional reasons, since it can be implemented using the fast Fourier transform (FFT) 
algorithm (see Chapter 6). The DFT is defined as 

JV-l 



E /N W N \ (2.4.43) 



n=0 

and its inverse as 



N-l 

N 



1 "- 1 



fc=0 



where Wn = e J 27r / N . These are the same formulas as (2.4.38-2.4.39), except that 
f[n] and F[k] are not defined for n,k {0, . . . ,N — 1}. Recall that the discrete-time 
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Fourier transform of a finite-length sequence can be sampled at u = 2n/N (which 
periodizes the sequence). Therefore, it is useful to think of the DFT as the transform 
of one period of a periodic signal, or a sampling of the DTFT of a finite-length signal. 
In both cases, there is an underlying periodic signal. Therefore, all properties are 
with respect to this inherent periodicity. For example, the convolution property of 
the DFT leads to periodic convolution (see (2.4.40)). Because of the finite-length 
signals involved, the DFT is a mapping on C and can thus be best represented as 
a matrix- vector product. Calling F the Fourier matrix with entries 

F n , k = Wf, n,k = 0,...,N-l, 

then its inverse is equal to (following (2.4.44)) 

F" 1 = — F*. (2.4.45) 

Given a sequence {/[0],/[l], . . . , f[N — 1]}, we can define a circular convolution 
matrix C with a first line equal to {/[0], f[N — 1], . . . , /[l]} and each subsequent 
line being a right circular shift of the previous one. Then, circular convolution of 
{/[n]} with a sequence {g[n]} can be written as 

f* p g = Cg = F^AFg, 

according to the convolution property (2.4.40-2.4.41), where A is a diagonal matrix 
with F[k] on its diagonal. Conversely, this means that C is diagonalized by F 
or that the complex exponential sequences { e n 27T / N ) nk } = W^ n are eigenvectors 
of the convolution matrix C, with eigenvalues F[k]. Note that the time reversal 
associated with convolution is taken into account in the definition of the circulant 
matrix C. 

Using matrix notation, Parseval's formula for the DFT follows easily. Call / 
the Fourier transform of the vector / = ( /[0] /[l] • • • f[N — 1] ) , that is 

/ = Ff, 

and a similar definition for g as the Fourier transform of g. Then 

f*9 = (FfT(Fg) = f*F*Fg = Nf*g, 

where we used (2.4.45), that is, the fact that F* is the inverse of F up to a scale 
factor of N. 

Other properties of the DFT follow from their counterparts for the discrete-time 
Fourier transform, bearing in mind the underlying circular structure implied by the 
discrete-time Fourier series (for example, a shift is a circular shift). 
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(a) 



fit) 



F(D) 




ifW 



(c) 



-2TJ- 



(d) 



/I"] 



iF (D) 



lF(U) 



L^ r 




F[k] 



Figure 2.3 Fourier transforms with various combinations of continu- 

ous/discrete time and frequency variables (see also Table 2.1). (a) Continuous- 
time Fourier transform, (b) Continuous-time Fourier series (note that the 
frequency-domain function is discrete in frequency appearing at multiples of 
27r/T, with weights -F[fc]). (c) Discrete-time Fourier transform (note that the 
time-domain function is discrete in time, appearing at multiples of 2tt /lu Si with 
weights /[«])• (d) Discrete-time Fourier series. 



2.4.9 Summary of Various Flavors of Fourier Transforms 

Between the Fourier transform, where both time and frequency variables are con- 
tinuous, and the discrete-time Fourier series (DTFS), where both variables are 
discrete, there are a number of intermediate cases. 

First, in Table 2.1 and Figure 2.3, we compare the Fourier transform, Fourier 
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Figure 2.4 Fourier transform with length and bandwidth restrictions on the 
signal (see also Table 2.2). (a) Fourier transform of bandlimited signals, where 
the time-domain signal can be sampled. Note that the function in frequency 
domain has support on (— w s /2, Ws/2)- (b) Fourier transform of finite-length 
signals, where the frequency-domain signal can be sampled, (c) Fourier series 
of bandlimited periodic signals (it has a finite number of Fourier components), 
(d) Discrete-time Fourier transform of finite- length sequences. 



series, discrete-time Fourier transform and discrete-time Fourier series. The table 
shows four combinations of continuous versus discrete variables in time and fre- 
quency. As defined in Section 2.4.1, we use a short-hand CT or DT for continuous- 
versus discrete-time variable, and we call it a Fourier transform or series if the 
synthesis formula involves an integral or a summation. 

Then, in Table 2.2 and Figure 2.4, we consider the same transforms but when 
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the signal satisfies some additional restrictions, that is, when it is limited either in 
time or in frequency. In that case, the continuous function (of time or frequency) 
can be sampled without loss of information. 

2.5 Signal Processing 

This section briefly covers some fundamental notions of continuous and discrete- 
time signal processing. Our focus is on linear time-invariant or periodically time- 
varying systems. For these, weighted complex exponentials play a special role, 
leading to the Laplace and z-transform as useful generalizations of the continu- 
ous and discrete-time Fourier transforms. Within this class of systems, we are 
particularly interested in those having finite-complexity realizations or finite-order 
differential/difference equations. These will have rational Laplace or z-transforms, 
which we assume in what follows. For further details, see [211, 212]. We also discuss 
the basics of multirate signal processing which is at the heart of the material on 
discrete-time bases in Chapter 3. More material on multirate signal processing can 
be found in [67, 308]. 

2.5.1 Continuous-Time Signal Processing 

Signal processing, which is based on Fourier theory, is concerned with actually 
implementing algorithms. So, for example, the study of filter structures and their 
associated properties is central to the subject. 

The Laplace Transform An extension of the Fourier transform to the complex 
plane (instead of just the frequency axis) is the following: 



/oo 
f(t)e- st dt, 
-oo 



where s = a + ju. This is equivalent, for a given a, to the Fourier transform of 
f(t)-e~ at , that is, the transform of an exponentially weighted signal. Now, the above 
transform does not in general converge for all s, that is, associated with it is a region 
of convergence (ROC). The ROC has the following important properties [212]: The 
ROC is made up of strips in the complex plane parallel to the jw-axis. If the jw-axis 
is contained in the ROC, then the Fourier transform converges. Note that if the 
Laplace transform is rational, then the ROC cannot contain any poles. If a signal 
is right-sided (that is, zero for t < Tq) or left-sided (zero for t > T±), then the ROC 
is right- or left-sided, respectively, in the sense that it extends from some vertical 
line (corresponding to the limit value of Re(s) up to where the Laplace transform 
converges) all the way to Re(s) becoming plus or minus infinity. It follows that a 
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finite-length signal has the whole complex plane as its ROC (assuming it converges 
anywhere), since it is both left- and right-sided and connected. 

If a signal is two-sided, that is, neither left- nor right-sided, then its ROC is the 
intersection of the ROC's of its left- and right-sided parts. This ROC is therefore 
either empty or of the form of a vertical strip. 

Given a Laplace transform (such as a rational expression), different ROC's lead 
to different time-domain signals. Let us illustrate this with an example. 

Example 2.1 

Assume F(s) = l/((s + l)(s + 2)). The ROC {Re(s) < -2} corresponds to a left-sided 
signal 

f(t) = - (e -'- e - 2 ') M (-i). 

The ROC {Re(s) > — 1} corresponds to a right-sided signal 

/(*) = (e-'- e - 2t Mi). 
Finally, the ROC { — 2 < Re(s) < —1} corresponds to a two-sided signal 

fit) = -e"' u{-t)-e- 2t u(t). 

Note that only the right-sided signal would also have a Fourier transform (since its ROC 
includes the ju-axis). 

For the inversion of the Laplace transform, recall its relation to the Fourier 
transform of an exponentially weighted signal. Then, it can be shown that its 
inverse is 

f(t) = — / F(s) e* ds, 

where a is chosen inside the ROC. We will denote a Laplace transform pair by 

f(t) < — ► F(s), s e ROC. 

For a review of Laplace transform properties, see [212]. Next, we will concentrate 
on filtering only. 

Linear Time-Invariant Systems The convolution theorem of the Laplace trans- 
form follows immediately from the fact that exponentials are eigenfunctions of the 
convolution operator. For, if f(t) = h(t) * g(t) and h(t) = e , then 

f(t) = J h(t-T) g(r) dr = Je^-^g(r)dr = e st j V"" g(r) dr = e st G(s). 

The eigenvalue attached to e st is the Laplace transform of g(t) at s. Thus, 
fit) = hit)* git) ^ F(s) = H(s)G(s), 
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with an ROC containing the intersection of the ROC's of H(s) and G(s). 
The differentiation property of the Laplace transform says that 

with ROC containing the ROC of F(s). Then, it follows that linear constant- 
coefficient differential equations can be characterized by a Laplace transform called 
the transfer function H(s). Linear, time- invariant differential equations, given by 

fc=0 fc=0 

lead, after taking the Laplace transform, to the following ratio: 



H(s) 



X ( s ) E£W 



N „,.k 



that is, the input and the output are related by a convolution with a filter having 
impulse response hit), where h(t) is the inverse Laplace transform of H(s). 

To take this inverse Laplace transform, we need to specify the ROC. Typically, 
we look for a causal solution, where we solve the differential equation forward 
in time. Then, the ROC extends to the right of the vertical line which passes 
through the rightmost pole. Stability 11 of the filter corresponding to the transfer 
function requires that the ROC include the jw-axis. This leads to the well-known 
requirement that a causal system with rational transfer function is stable if and 
only if all the poles are in the left half-plane (the real part of the pole location is 
smaller than zero). In the above discussion, we have assumed initial rest conditions, 
that is, the homogeneous solution of differential Equation (2.5.1) is zero (otherwise, 
the system is neither linear nor time-invariant). 

Example 2.2 Butterworth Filters 

Among various classes of continuous-time niters we will briefly describe the Butterworth 
filters, both because they are simple and because they will reappear later as useful filters in 
the context of wavelets. The magnitude squared of the Fourier transform of an JVth-order 
Butterworth filter is given by 

\Hn(M\ 2 = ——\. TSFf , (2-5.2) 

where lo c is a parameter which will specify the cutoff frequency beyond which sinusoids are 
substantially attenuated. Thus, u> c defines the bandwidth of the lowpass Butterworth filter. 



Stability of a filter means that a bounded input produces a bounded output. 
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Since \Hn{Juj)\ 2 — H(ju>)H*(ju>) — H(jui)H(—juj) when the filter is real, and noting that 
(2.5.2) is the Laplace transform for s — jcu, we get 

H{s) H{ - s) = TT7T- W (2 ' 5 - 3) 

1 + {S/jOJc) 

The poles of H(s)H(-s) are thus at (-1) 1/2JV (jw c ), or 

|, I, 7T(2fc+l) 7T 

|s fc | = w c , arg[s fe ] = — + -, 

and k — 0, . . . , 2N — 1. The poles thus lie on a circle, and they appear in pairs at is*,. 
To get a stable and causal filter, one simply chooses the N poles which lie on the left-hand 
side half-circle. Since pole locations specify the filter only up to a scale factor, set s — 
in (2.5.3) which leads to H(0) — 1. For example, a second-order Butterworth filter has the 
following Laplace transform: 

2 

H 2 (s) = t —t^- -7TT. (2.5.4) 

One can find its "physical" implementation by going back, through the inverse Laplace 
transform, to the equivalent linear constant-coefficient differential equation. See also Ex- 
ample 3.6 in Chapter 3, for discrete-time Butterworth filters. 

2.5.2 Discrete-Time Signal Processing 

Just as the Laplace transform was a generalization of the Fourier transform, the 
z-transform will be introduced as a generalization of the discrete-time Fourier trans- 
form [149]. Again, it will be most useful for the study of difference equations (the 
discrete-time equivalent of differential equations) and the associated discrete-time 
filters. 

The z-Transform The forward z-transform is defined as 

oo 

F(z) = Yl /W z " n < ( 2 - 5 - 5 ) 

n=— oo 

where z € C. On the unit circle z = e 3U) , this is the discrete-time Fourier transform 
(2.4.32), and for z = pe 3u; , it is the discrete-time Fourier transform of the sequence 
f[n] ■ p n . Similarly to the Laplace transform, there is a region of convergence 
(ROC) associated with the z-transform F(z), namely a region of the complex plane 
where F{z) converges. Consider the case where the z-transform is rational and 
the sequence is bounded in amplitude. The ROC does not contain any pole. If the 
sequence is right-sided (left-sided), the ROC extends outward (inward) from a circle 
with the radius corresponding to the modulus of the outermost (innermost) pole. If 
the sequence is two-sided, the ROC is a ring. The discrete-time Fourier transform 
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converges absolutely if and only if the ROC contains the unit circle. From the 
above discussion, it is clear that the unit circle in the z-plane of the z-transform 
and the jw-axis in the s-plane of the Laplace transform play equivalent roles. 

Also, just as in the Laplace transform, a given z-transform corresponds to dif- 
ferent signals, depending on the ROC attached to it. 

The inverse ^-transform involves contour integration in the ROC and Cauchy's 
integral theorem [211]. If the contour of integration is the unit circle, the inver- 
sion formula reduces to the discrete-time Fourier transform inversion (2.4.33). On 
circles centered at the origin but of radius p different from 1 , one can think of for- 
ward and inverse z-transforms as the Fourier analysis and synthesis of a sequence 
f'[n] = p n f[n]. Thus, convergence properties are as for the Fourier transform of the 
exponentially weighted sequence. In the ROC, we can write formally a z-transform 
pair as 

f[n] < — ► F{z), z G ROC. 

When z-transforms are rational functions, the inversion is best done by partial frac- 
tion expansion followed by term-wise inversion. Then, the z-trans- 
form pairs, 

a n u[n] < — > — j- \z\ > \a\, (2.5.6) 



and 



1 — az~ 



-a n u[-n-l] < — > — r \z\ < \a\, (2.5.7) 

1 — az l 



are useful, where u[n] is the unit-step function (u[n] = 1, n > 0, and otherwise). 
The above transforms follow from the definition (2.5.5) and the sum of geometric 
series, and they are a good example of identical z-transforms with different ROC's 
corresponding to different signals. 

As a simple example, consider the sequence 

f[n] = aH 

which, following (2.5.6-2.5.7), has a z-transform 

F(z) = - — l -^ - ^ r , ROC \a\ <\z\< - , 

1 — az~ l 1 — l/az _i a 

that is, a nonempty ROC only if \a\ < 1. For more z-transform properties, see 

[211]. 
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Convolutions, Difference Equations and Discrete-Time Filters Just as in con- 
tinuous time, complex exponentials are eigenfunctions of the convolution operator. 
That is, if f[n] = h[n] * g[n] and h[n] = z n , z £ C, then 

f[n] = Y,h[n-k]g[k] = X>< n - fc >s[fc] = ^"E 2 "^ = *" G (*)- 

k k k 

The z-transform G{z) is thus the eigenvalue of the convolution operator for that 
particular value of z. The convolution theorem follows as 

f[n] = h[n]*g[n] <— > F(z) = H(z) G(z), 

with an ROC containing the intersection of the ROC's of H{z) and G{z). Convo- 
lution with a time-reversed filter can be expressed as an inner product, 

f[n] = \]x[k] h[n — k] = YJx[fc] h[k — n] = (x[k],h[k — n]), 

k k 

where " ~ " denotes time reversal, h[n] = h[—n]. 

It is easy to verify that the "delay by one" operator, that is, a discrete-time 
filter with impulse response S[n — 1] has a z-transform z~ l . That is why z~ l is 
often called a delay, or z~ l is used in block diagrams to denote a delay. Then, given 
x[n] with the z-transform X(z), x[n — k] has a z-transform 

x[n- k] < — ► z~ k X{z). 

Thus, a linear constant-coefficient difference equation can be analyzed with the 
z-transform, leading to the notion of a transfer function. We assume initial rest 
conditions in the following, that is, all delay operators are set to zero initially. Then, 
the homogeneous solution to the difference equation is zero. Assume a linear, time- 
invariant difference equation given by 

N M 

^2a k y[n-k) = ^b k x[n - k), (2.5.8) 

fc=o k=0 

and taking its z-transform using the delay property, we get the transfer function as 
the ratio of the output and input ^-transforms, 

The output is related to the input by a convolution with a discrete-time filter having 
as impulse response h[n], the inverse z-transform of H(z). Again, the ROC depends 
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on whether we wish a causal 12 or an anticausal solution, and the system is stable 
if and only if the ROC includes the unit circle. This leads to the conclusion that 
a causal system with rational transfer function is stable if and only if all poles are 
inside the unit circle (their modulus is smaller than one). 

Note, however, that a system with poles inside and outside the unit circle can 
still correspond to a stable system (but not a causal one). Simply gather poles inside 
the unit circle into a causal impulse response, while poles outside correspond to an 
anticausal impulse response, and thus, the stable impulse response is two-sided. 

From a transfer function given by a z-transform it is always possible to get a 
difference equation and thus a possible hardware implementation. However, many 
different realizations have the same transfer function and depending on the ap- 
plication, certain realizations will be vastly superior to others (for example, in 
finite-precision implementation). Let us just mention that the most obvious im- 
plementation which realizes the difference equation (2.5.8), called the direct-form 
implementation is poor as far as coefficient quantization is concerned. A better 
solution is obtained by factoring H{z) into single and/or complex conjugate roots 
and implementing a cascade of such factors. For a detailed discussion of numerical 
behavior of filter structures see [211]. 

Autocorrelation and Spectral Factorization An important concept which we 
will use later in the book, is that of deterministic autocorrelation (autocorrelation 
in the statistical sense will be discussed in Chapter 7, Appendix 7. A). We will say 
that 

p[m] = {h[n],h[n + m\), 

is the deterministic autocorrelation (or, simply autocorrelation from now on) of the 
sequence h[n]. In Fourier domain, we have that 

oo oo oo 

P(e ju) ) = J2 V[n]e~ jwn = J2 J2 h*[k] h[k + n] e~ jujn , 

n=—oo n=— oo fc=— oo 

= H*{e? u ) H(e juJ ) = \H(e ju )\ 2 , 

that is, P(e Ja; ) is a nonnegative function on the unit circle. In other words, the 
following is a Fourier-transform pair: 

p[m] = {h[n],h[n + m}) < — ► P(e juJ ) = \H(e juJ )\ 2 . 

Similarly, in z-domain, the following is a transform pair: 

p[m] = (h[n],h[n + m]) < — > P(z) = H(z) H*(l/z) 



2 A discrete-time sequence x[n] is said to be causal if x[n] — for n < 0. 
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(recall that the subscript * implies conjugation of the coefficients but not of z). 
Note that from the above, it is obvious that if z^ is a zero of P(z), so is 1/zt (that 
also means that zeros on the unit circle are of even multiplicity). When h[n] is 
real, and Zk is a zero of H(z), then zt, 1/zjfe, 1/zZ are zeros as well (they are not 
necessarily different). 

Suppose now that we are given an autocorrelation function P(z) and we want 
to find H(z). Here, H(z) is called a spectral factor of P(z) and the technique of 
extracting it, spectral factorization. These spectral factors are not unique, and are 
obtained by assigning one zero out of each zero pair to H{z) (we assume here that 
p[m] is FIR, otherwise allpass functions (2.5.10) can be involved). The choice of 
which zeros to assign to H{z) leads to different spectral factors. To obtain a spectral 
factor, first factor P(z) into its zeros as follows: 

N u N N 

p{ Z ) = «n(( i -^^ i )( i -^^))ii( i -^^ i )ii( i -4^' 

i=l i=l i=l 

where the first product contains the zeros on the unit circle, and thus |zij = 1, 
and the last two contain pairs of zeros inside/outside the unit circle, respectively. 
In that case, \z 2i \ < 1. To obtain various H(z), one has to take one zero out of 
each zero pair on the unit circle, as well as one of two zeros inside/outside the 
unit circle. Note that all these solutions have the same magnitude response but 
different phase behavior. An important case is the minimum phase solution which 
is the one, among all causal spectral factors, that has the smallest phase term. To 
get a minimum phase solution, we will consistently choose the zeros inside the unit 
circle. Thus, H{z) would be of the form 

N u N 

H(z) = ^\{{l-z li z- 1 )\[{l-z 2i z- 1 ). 
i=i i=i 

Examples Of Discrete-Time Filters Discrete-time filters come in two major 
classes. The first class consists of infinite impulse response (IIR) filters, which 
correspond to difference equations where the present output depends on past out- 
puts (that is, N > 1 in (2.5.8)). IIR filters often depend on a finite number of past 
outputs {N < oo) in which case the transfer function is a ratio of polynomials in 
z~ l . Often, by abuse of language, we will call an IIR filter a filter with a rational 
transfer function. The second class corresponds to nonrecursive, or finite impulse 
response (FIR) filters, where the output only depends on the inputs (or N = in 
(2.5.8)). The z-transform is thus a polynomial in z~ l . An important class of FIR 
filters are those which have symmetric or antisymmetric impulse responses because 
this leads to a linear phase behavior of their Fourier transform. Consider causal 
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FIR filters of length L. When the impulse response is symmetric, one can write 

H(e> u ) = e -ML-W A(u), 

where L is the length of the filter, and A{oj) is a real function of u. Thus, the phase 
is a linear function of to. Similarly, when the impulse response is antisymmetric, 
one can write 

H{j u ) = je-i-^" 1 )/ 2 B(u), 

where B(uj) is a real function of u>. Here, the phase is an afiine function of u> (but 
usually called linear phase). 

One way to design discrete-time filters is by transformation of an analog filter. 
For example, one can sample the impulse response of the analog filter if its magni- 
tude frequency response is close enough to being bandlimited. Another approach 
consists of mapping the s-plane of the Laplace transform into the z-plane. From 
our previous discussion of the relationship between the two planes, it is clear that 
the jw-axis should map into the unit circle and the left half-plane should become 
the inside of the unit circle in order to preserve stability. Such a mapping is given 
by the bilinear transformation [211] 



B(z) = (3 



1-z- 1 



1 + Z" 1 ' 

Then, the discrete-time filter Hj is obtained from a continuous-time filter H c by 
setting 

H d (z) = H c (B(z)). 

Considering what happens on the jto-axis and the unit circle, it can be verified that 
the bilinear transform warps the frequency axis as 10 = 2arctan(tc> c //3), where u 
and u> c are the discrete and continuous frequency variables, respectively. 

As an example, the discrete-time Butterworth filter has a magnitude frequency 
response equal to 

1 

1 + (tan(cd/2)/tan(cj /2))' 



\ H (^)\ 2 = : , - ; - — m - ( 2 - 5 - 9 ) 



This squared magnitude is flat at the origin, in the sense that its first 2N — 1 
derivatives are zero at w = 0. Note that since we have a closed- form factorization of 
the continuous-time Butterworth filter (see (2.5.4)), it is best to apply the bilinear 
transform to the factored form rather than factoring (2.5.9) in order to obtain 
H(e Ju; ) in its cascade form. 

Instead of the above indirect construction, one can design discrete-time filters 
directly. This leads to better designs at a given complexity of the filter or, con- 
versely, to lower-complexity filters for a given filtering performance. 
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In the particular case of FIR linear phase filters (that is, a finite- length sym- 
metric or antisymmetric impulse response), a powerful design method called the 
Parks-McClellan algorithm [211] leads to optimal filters in the minimax sense (the 
maximum deviation from the desired Fourier transform magnitude is minimized). 
The resulting approximation of the desired frequency response becomes equiripple 
both in the passband and stopband (the approximation error is evenly spread out). 
It is thus very different from a monotonically decreasing approximation as achieved 
by a Butterworth filter. 

Finally, we discuss the allpass filter, which is an example of what could be called 
a unitary filter. An allpass filter has the property that 

\H ap (e ju >)\ = 1, (2.5.10) 

for all to. Calling y[n] the output of the allpass when x[n] is input, we have 



^lin^)|| 2 = ^\\H ap {en X{enf = J-, 



mi 2 = — w^nf = — \\H ap (en X(en\\ 2 = —\\x(en\\ 2 = INI 2 



which means it conserves the energy of the signal it filters. An elementary single- 
pole/zero allpass filter is of the following form (see also Appendix 3. A in Chapter 
3): ' 

Writing the pole location as a = pe 3 , the zero is at 1/a* = {\j p)e? . A general 
allpass filter is made up of elementary sections as in (2.5.11) 

H ap {z) = ft f l " ^ = §4> (2-5-12) 

f-^ 1 -a, z x P{z) 

where P{z) = z~ N P*(z~ l ) is the time-reversed and coefficient-conjugated version 
of P(z) (recall that the subscript * stands for conjugation of the coefficients of the 
polynomial, but not of z). On the unit circle, 



P*(e 3W ) 



HUe> u ) = e 



-JLON 



and property (2.5.10) follows easily. That all rational functions satisfying (2.5.10) 
can be factored as in (2.5.12) is shown in [308]. 

2.5.3 Multirate Discrete-Time Signal Processing 

As implied by its name, multirate signal processing deals with discrete-time se- 
quences taken at different rates. While one can always go back to an underlying 
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continuous-time signal and resample it at a different rate, most often, the rate 
changes are being done in the discrete-time domain. We review some of the key 
results. For further details, see [67] and [308]. 

Sampling Rate Changes Downsampling or subsampling 1 ^ a sequence x[n] by an 
integer factor N results in a sequence y[n] given by 

y[n] = x[nN], 

that is, all samples with indexes modulo TV different from zero are discarded. In 
the Fourier domain, we get 

N-l 

Y{e jw ) = —J2 X { eJ{UJ ~ 2nk)/N )> (2.5.13) 

fc=0 

that is, the spectrum is stretched by N, and (TV — 1) aliased versions at multiples 
of 2ir are added. They are called aliased because they are copies of the original 
spectrum (up to a stretch) but shifted in frequency. That is, low-frequency com- 
ponents will be replicated at the aliasing frequencies Ui = 2iri/N, as will high 
frequencies (with an appropriate shift). Thus, some high-frequency sinusoid might 
have a low-frequency alias. Note that the aliased components are nonharmonically 
related to the original frequency component; a fact that can be very disturbing in 
applications such as audio. Sometimes, it is useful to extend the above relation to 
the z-transform domain; 

N-l 

Y ^ = nY, X ( W n z1/N ) > ( 2 - 5 - 14 ) 

fc=0 

where Wn = e~i 27T > as usual. To prove (2.5.14), consider first a signal x'[n] which 

equals x[n] at multiples of N, and elsewhere. If x[n] has z-transform X(z), then 

X'(z) equals 

N-l 

X '^ = n £ X ( W " 2 ) ( 2 - 5 - 15 ) 

fc=0 

as can be shown by using the orthogonality of the roots of unity (2.1.3). To obtain 
y[n] from x'[n], one has to drop the extra zeros between the nonzero terms or 
contract the signal by a factor of N. This is obtained by substituting z 1 ' for z in 
(2.5.15), leading to (2.5.14). Note that (2.5.15) contains the signal X as well as its 



13 Sometimes, the term decimation is used even though it historically stands for "keep 9 out of 
10" in reference to a Roman practice of killing every tenth soldier of a defeated army. 
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Figure 2.5 Downsampling by 3 in the frequency domain, (a) Original spec- 
trum (we assume a real spectrum for simplicity), (b) The three stretched 
replicas and the sum Y(e JU} ). 



N - 1 modulated versions (on the unit circle, X(Wfrz) = X(e j ^~ k2 " r / N ^>)). This 
is the reason why in Chapter 3, we will call the analysis dealing with X(W^ z), 
modulation-domain analysis. 

An alternative proof of (2.5.13) (which is (2.5.14) on the unit circle) consists 
of going back to the underlying continuous-time signal and resampling with an 
TV-times larger sampling period. This is considered in Problem 2.10. 

By way of an example, we show the case N = 3 in Figure 2.5. It is obvious 
that in order to avoid aliasing, downsampling by N should be preceded by an ideal 
lowpass filter with cutoff frequency tt/N (see Figure 2.6(a)). Its impulse response 
h[n] is given by 



h[n] 



1 

2^ 



■k/N 



n/N 



s ju;n duj 



sin ixn/N 



nn 



(2.5.16) 
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(a) • lp-.u/N — Qvt 



(b) 



(c) * @ LP: min([]/M, Q/AO (jvi, 



Figure 2.6 Sampling rate changes, (a) Downsampling by TV preceded by ideal 
lowpass filtering with cutoff frequency ir/N . (b) Upsampling by M followed 
by interpolation with an ideal lowpass filter with cutoff frequency 71" /M. (c) 
Sampling rate change by a rational factor M/N, with an interpolation filter in 
between. The cutoff frequency is the lesser of ir/M and ir/N . 



The converse of downsampling is upsampling by an integer M. That is, to obtain a 
new sequence, one simply inserts M — 1 zeros between consecutive samples of the 
input sequence, or 



y[n\ 



x[n/M] n = kM,k e Z 
otherwise. 



In Fourier domain, this amounts to 

Y(e> u ) = X(e jMu; ), 
and similarly, in z-transform domain 

Y(z) = X{z M ). 



(2.5.17) 



(2.5.18) 



Due to upsampling, the spectrum contracts by M. Besides the "base spectrum" 
at multiples of 2n, there are spectral images in between which are due to the 
interleaving of zeros in the upsampling. To get rid of these spectral images, a 
perfect interpolator or a lowpass filter with cutoff frequency n/M has to be used, 
as shown in Figure 2.6(b). Its impulse response is as given in (2.5.16), but with a 

different scale factor, 

sin7rn/M 

1 J TTTl/M 

It is easy to see that /i[nM] = S[n\. Therefore, calling u[n] the result of the in- 
terpolation, or u[n] = y[n] * h[n], it follows that a[nM] = x[n]. Thus, u[n] is a 
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perfect interpolation of x[n] in the sense that the missing samples have been filled 
in without disturbing the original ones. 

A rational sampling rate change by M/N is obtained by cascading upsampling 
and downsampling with an interpolation filter in the middle, as shown in Figure 
2.6(c). The interpolation filter is the cascade of the ideal lowpass for the upsampling 
and for the downsampling, that is, the narrower of the two in the ideal filter case. 

Finally, we demonstrate a fact that will be extensively used in Chapter 3. It 
can be seen as an application of downsampling followed by upsampling to the de- 
terministic autocorrelation of g[n]. This is the discrete-time equivalent of (2.4.31). 
We want to show that the following holds: 

JV-1 

(g[n],g[n + Nl}) = 6[l] <— > £ G(W&z) G^**" 1 ) = N. (2.5.19) 

fc=o 

The left side of the above equation is simply the autocorrelation of g[n] evaluated 
at every iVth index m = Nl. If we denote the autocorrelation of g[n] as p[n], then 
the left side of (2.5.19) is p'[n] = p[Nn]. The z-transform of p'[n] is (apply (2.5.14)) 

N-l 

p/ w = jf E p « zl/N ^- 

fc=0 

Replace now z l ' N by z and since the z-transform of p[n] is P(z) = G(z)G(z~ 1 ), we 
get that the ^-transform of the left side of (2.5.19) is the right side of (2.5.19). 

Multirate Identities 

Commutativity of Sampling Rate Changes Upsampling by M and downsampling by 
N commute if and only if M and iV are coprime. 

The relation is shown pictorially in Figure 2.7(a). Using (2.5.14) and (2.5.18) 
for down and upsampling in z-domain, we find that upsampling by M followed by 
downsampling by TV leads to 

iV-l 



k=0 



Y u/d (z) = J>«^), 
while the reverse order leads to 

Y d/U (z) = J2 X ( W < 



N-l 

kM Z M/N^ 



A;=0 



For the two expressions to be equal, kM mod iV has to be a permutation, that is, 
kM mod iV = / has to have a unique solution for all I G {0, . . . ,N — 1}. If M and N 
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(a) 



(M,N) coprime 

{m|V- (ivf)— — — \,vtj :ui 



(b) • Gff— H(z) 



Figure 2.7 Multirate identities, (a) Commutativity of up and downsampling. 
(b) Interchange of downsampling and filtering, (c) Interchange of filtering and 
upsampling. 



have a common factor L > 1, then M = M' L and N = N' L. Note that (kM mod 
N) mod L is zero, or kM mod iV is a multiple of L and thus not a permutation. 
If M and TV are coprime, then Bezout's identity [209] guarantees that there exist 
two integers m and n such that mM + nN = 1. It follows that mM mod N = 1 
thus, k = ml mod iV is the desired solution to the equation k M mod N = I. This 
property has an interesting generalization in multiple dimensions (see for example 
[152]). 

Interchange of Filtering and Downsampling Downsampling by ./V followed by filtering 
with a filter having z-transform H{z) is equivalent to filtering with the upsampled 
filter H(z N ) before the downsampling. 

Using (2.5.14), it follows that downsampling the filtered signal with the z- 
transform X(z)H(z ) results in 



N-l 



N-l 



J2 X{W« z l l N ) H ({W k N z l / N ) N ) = H(z) J2 X ( W n ^ 1/7V )> 

k=0 fc=0 

which is equal to filtering a downsampled version of X{z). 

Interchange of Filtering and Upsampling Filtering with a filter having the z-transform 
H(z), followed by upsampling by N, is equivalent to upsampling followed by filtering 
with H(z N ). 

Using (2.5.18), it is immediate that both systems lead to an output with z- 
transform X(z N )H(z N ) when the input is X(z). 

In short, the last two properties simply say that filtering in the downsampled 
domain can always be realized by filtering in the upsampled domain, but then with 
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Figure 2.8 Polyphase transform (forward and inverse transforms for the case 
N = 3 are shown). 



the upsampled filter (down and upsampled stand for low versus high sampling rate 
domain). The last two relations are shown in Figures 2.7(b) and (c). 



Polyphase Transform Recall that in a time-invariant system, if input x[n] pro- 
duces output y[n], then input x[n + m] will produce output y[n + m]. In a time- 
varying system this is not true. However, there exist periodically time-varying 
systems for which if input x[n] produces output y[n], then x[n + Nm] produces 
output y[n + mN]. These systems are periodically time-varying with period N. For 
example, a downsampler by N followed by an upsampler by N is such a system. A 
downsampler alone is also periodically time-varying, but with a time-scale change. 
Then, if x[n] produces y[n], x[n + mN] produces y[n + m] (note that x[n] and y[n] 
do not live on the same time-scale). Such periodically time- varying systems can 
be analyzed with a simple but useful transform where a sequence is mapped into 
TV sequences with each being a shifted and downsampled version of the original 
sequence. Obviously, the original sequence can be recovered by simply interleaving 
the subsequences. Such a transform is called a polyphase transform of size N since 
each subsequence has a different phase and there are TV of them. The simplest 
example is the case N = 2, where a sequence is subdivided into samples of even 
and odd indexes, respectively. In general, we define the size-iV polyphase transform 



of a sequence x[n] as a vector of sequences (#oM x\\n\ 

Xi[n] = x[nN + i\. 



XN-i[n] ) , where 



These are called signal polyphase components. In z-transform domain, we can write 
X{z) as the sum of shifted and upsampled polyphase components. That is, 



X(z) 



N-l 

E 

i=0 



l Xi(z 



N\ 



(2.5.20) 
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where 



Xi(z) 



£ 



x[nN + i] z n . 



(2.5.21) 



Figure 2.8 shows the signal polyphase transform and its inverse (for the case N = 3). 
Because the forward shift requires advance operators which are noncausal, a causal 
version would produce a total delay of TV — 1 samples between forward and inverse 
polyphase transform. Such a causal version is obtained by multiplying the noncausal 
forward polyphase transform by z~ +l . 

Later we will need to express the output of filtering with H followed by down- 
sampling in terms of the polyphase components of the input signal. That is, we 
need the Oth polyphase component of H{z)X{z). This is easiest if we define a 
polyphase decomposition of the filter to have the reverse phase of the one used for 
the signal, or 



H(z) 



JV-1 



i=Q 



(2.5.22) 



with 



Hi(z) 



E w 



n — i\z 



0,...,iV-l. 



(2.5.23) 



Then the product H{z)X{z) after downsampling by N becomes 



iV-l 



Y(z) = E H &) X ^)- 



i=0 



The same operation (filtering by h[n] followed by downsampling by N) can be 
expressed in matrix notation as 



( ■■ ) 


I 


vM 




»[i] 




V : ) 





\ 



h[L-l] ■■■ h[L - N] h[L-N - 1] 
• • • h[L-l] 



( ! \ 

x[0] 
x[l] 

V : J 



where L is the filter length, and the matrix operator will be denoted by H. Simi- 
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larly, upsampling by N followed by filtering by g[n] can be expressed as 

/ : : \ 



/ ! \ 

x[0] 
x[l] 

V : J 



g[o] 



o 



g[N - 1] 

g[N] g[o] 



V 



/ i \ 
y[o] 
y[i] 

V : J 



J 



Here the matrix operator is denoted by G. Note that if h[n] = g[— n], then H = G , 
a fact that will be important when analyzing orthonormal filter banks in Chapter 3. 



2.6 Time-Frequency Representations 

While the Fourier transform and its variations are very useful mathematical tools, 
practical applications require basic modifications. These modifications aim at "lo- 
calizing" the analysis, so that it is not necessary to have the signal over (—00,00) 
to perform the transform (as required with the Fourier integral) and so that local 
effects (transients) can be captured with some accuracy. The classic example is the 
short-time Fourier [204], or Gabor transform 14 [102], which uses windowed complex 
exponentials and their translates as expansion functions. We therefore discuss the 
localization properties of basis functions and derive the uncertainty principle which 
gives a lower bound on the joint time and frequency resolutions. We then review the 
short-time Fourier transform and its associated energy distribution called the spec- 
trogram and introduce the wavelet transform. Block transforms are also discussed. 
Finally, an example of a bilinear expansion, namely the Wigner-Ville distribution, 
is also discussed. 

2.6.1 Frequency, Scale and Resolution 

When calculating a signal expansion, a primary concern is the localization of a 
given basis function in time and frequency. For example, in the Fourier transform, 
the functions used in the analysis are infinitely sharp in their frequency localization 
(they exist at one precise frequency) but have no time localization because of their 
infinite extent. 

There are various ways to define the localization of a particular basis function, 
but they are all related to the "spread" of the function in time and frequency. For 



Gabor's original paper proposed synthesis of signals using complex sinusoids windowed by a 
Gaussian, and is thus a synthesis rather than an analysis tool. However, it is closely related to the 
short-time Fourier transform, and we call Gabor transform a short-time Fourier transform using a 
Gaussian window. 
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Figure 2.9 Tile in the time-frequency plane as an approximation of the time- 
frequency localization of /(£). Intervals It and I w contain 90% of the energy 
of the time- and frequency-domain functions, respectively. 
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Figure 2.10 Elementary operations on a basis function / and effect on the 
time-frequency tile, (a) Shift in time by r producing /' and modulation by luq 
producing /". (b) Scaling f'(t) = f(at) (a = 1/3 is shown). 



example, one can define intervals It and I w which contain 90% of the energy of 
the time- and frequency-domain functions, respectively, and are centered around 
the center of gravity of |/(t)| 2 and |-F(w)| 2 (see Figure 2.9). This defines what we 
call a tile in the time-frequency domain, as shown in Figure 2.9. For simplicity, we 
assumed a complex basis function. A real basis function would be represented by 
two mirror tiles at positive and negative frequencies. 

Consider now elementary operations on a basis function and their effects on the 
tile. Obviously, a shift in time by r results in shifting of the tile by r. Similarly, 
modulation by e 3U)ot shifts the tile by loq in frequency (vertically). This is shown 
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in Figure 2.10(a). Finally, scaling by a, or /'(£) = f(at), results in I[ = (l/a)It 
and I' u = al^, following the scaling property of the Fourier transform (2.4.5). That 
is, both the shape and localization of the tile have been affected, as shown in 
Figure 2.10(b). Note that all elementary operations conserve the surface of the 
time-frequency tile. In the scaling case, resolution in frequency was traded for 
resolution in time. 

Since scaling is a fundamental operation used in the wavelet transform, we need 
to define it properly. While frequency has a natural ordering, the notion of scale 
is defined differently by different authors. The analysis functions for the wavelet 
transform will be defined as 

ll>a,b{t) = -^ ( — J . a G ^ + 

'a \ a I 



where the function ip(t) is usually a bandpass filter. Thus, large a's (a ^> 1) 
correspond to long basis functions, and will identify long-term trends in the signal 
to be analyzed. Small a's (0 < a < 1) lead to short basis functions, which will follow 
short-term behavior of the signal. This leads to the following: Scale is proportional 
to the duration of the basis functions used in the signal expansion. 

Because of this, and assuming that a basis function is a bandpass filter as in 
wavelet analysis, high-frequency basis functions are obtained by going to small 
scales, and therefore, scale is loosely related to inverse frequency. This is only 
a qualitative statement, since scaling and modulation are fundamentally different 
operations as was seen in Figure 2.10. The discussed scale is similar to those in 
geographical maps, where large means a coarse, global view, and small corresponds 
to a fine, detailed view. 

Scale changes can be inverted if the function is continuous-time. In discrete 
time, the situation is more complicated. From the discussion of multirate signal 
processing in Section 2.5.3, we can see that upsampling (that is, a stretching of the 
sequence) can be undone by downsampling by the same factor, and this with no 
loss of information if done properly. Downsampling (or contraction of a sequence) 
involves loss of information in general, since either a bandlimitation precedes the 
downsampling, or aliasing occurs. This naturally leads to the notion of resolution of 
a signal. We will thus say that the resolution of a finite-length signal is the minimum 
number of samples required to represent it. It is thus related to the information 
content of the signal. For infinite-length signals having finite energy and sufficient 
decay, one can define the length as the essential support (for example, where 99% 
of the energy is). 

In continuous time, scaling does not change the resolution, since a scale change 
affects both the sampling rate and the length of the signal, thus keeping the number 
of samples constant. In discrete time, upsampling followed by interpolation does 
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(a) x[n] 



halfband 
lowpass 



Resolution: halved 
™ ' Scale: unchanged 



(b) x[nl - 



halfband 
lowpass 



Resolution: unchanged 
y[nl Scale: halved 



(C) 



x[n] 



halfband 
lowpass 



y[n] 



Resolution: halved 
Scale: doubled 



Figure 2.11 Scale and resolution in discrete-time sequences, (a) Lowpass 
filtering reduces the resolution, (b) Upsampling and interpolation change the 
scale but not the resolution, (c) Lowpass filtering and downsampling increase 
scale and reduces resolution. 



not affect the resolution, since the interpolated samples are redundant. Downsam- 
pling by N decreases the resolution by N, and cannot be undone. Figure 2.11 shows 
the interplay of scale and resolution on simple discrete-time examples. Note that 
the notion of resolution is central to multiresolution analysis developed in Chap- 
ters 3 and 4. There, the key idea is to split a signal into several lower-resolution 
components, from which the original, full-resolution signal can be recovered. 

2.6.2 Uncertainty Principle 

As indicated in the discussion of scaling in the previous section, sharpness of the 
time analysis can be traded off for sharpness in frequency, and vice versa. But 
there is no way to get arbitrarily sharp analysis in both domains simultaneously, as 
shown below [37, 102, 215]. Note that the sharpness is also called resolution in time 
and frequency (but is different from the resolution discussed just above, which was 
related to information content). Consider a unit energy signal f(t) with Fourier 
transform F(u) centered around the origin in time as well as in frequency, that is, 
satisfying j t\f{t)\ 2 dt = and J lu\F(uj)\ 2 doj = (this can always be obtained by 
appropriate translation and modulation). Define the time width At of fit) by 



and its frequency width A w by 

A 2 



t 2 \f(t)\ 2 dt, 



uj 2 \F(u)\ 2 cLv. 



(2.6.1) 
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THEOREM 2.7 Uncertainty Principle 

If f(t) vanishes faster than \j\ft as t — > ±oo, then 

A? A^ > 1, (2.6.2) 

where equality holds only for Gaussian signals 

« -at 2 



/(*) - \/- e • ( 2 - 6 - 3 ) 



Proof 

Consider the integral of t f(t) /'(£). Using Cauchy-Schwarz inequality (2.2.2), 



*/(*) /'(*) dt < / \tf(t)\ 2 dt / \f(t)\ 2 dt. (2.6.4) 

The first integral on the right side is equal to A 2 . Because f'(t) has Fourier trans- 
form juF(uj), and using Parseval's formula, we find that the second integral is equal 
to (l/(27r))A 2 . Thus, the integral on the left side of (2.6.4) is bounded from above by 
(1/(2tt))A 2 A 2 . Using integration by parts, and noting that /(*)/'(*) = (l/2)(df 2 (t))/(dt), 



L 



tfit)fit)dt = U t d -^dt = Itfwl^-i/ fdui,. 



By assumption, the limit of tf (t) is zero at infinity, and, because the function is of unit 
norm, the above equals —1/2. Replacing this into (2.6.4), we obtain 

i * h* *> 

or (2.6.2). To find a function that meets the lower bound note that Cauchy-Schwarz in- 
equality is an equality when the two functions involved are equal within a multiplicative 
factor, that is, from (2.6.4), 

f'(t) = ktf(t). 

Thus, f(t) is of the form 

/(£) = ce kt2/2 (2.6.5) 

and (2.6.3) follows for k — —2a and c = yfapK. 

The uncertainty principle is fundamental since it sets a bound on the maximum 
joint sharpness or resolution in time and frequency of any linear transform. It is 
easy to check that scaling does not change the time-bandwidth product, it only 
exchanges one resolution for the other, similarly to what was shown in Figure 2.10. 
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Example 2.3 Prolate Spheroidal Wave Functions 

A related problem is that of finding bandlimited functions which are maximally concentrated 
around the origin in time (recall that there exist no functions that are both bandlimited 
and of finite duration). That is, find a function /(£) of unit norm and bandlimited to loo 
(F(oj) — 0, |w| > ujo) such that, for a given T £ (0, oo) 

T \f(t)\ 2 at 

-T 

is maximized. It can be shown [216, 268] that the solution f(t) is the eigenfunction with 
the largest eigenvalue satisfying 

T nr f™f-;K r = X m . (2.6.6) 

, T 7r(t — r) 

An interpretation of the above formula is the following. If T — > oo, then we have the 
usual convolution with an ideal lowpass filter, and thus, any bandlimited function is an 
eigenfunction with eigenvalue 1. For finite T, because of the truncation, the eigenvalues will 
be strictly smaller than one. Actually, it turns out that the eigenvalues belong to (0, 1) and 
are all different, or 

1 > A > Ai > • • • > A„ — ► 0, n — ► oo. 

Call f n (t) the eigenfunction of (2.6.6) with eigenvalue A n . Then (i) each f n (t) is unique (up 
to a scale factor), (ii) f n (t) and f m (t) are orthogonal for n ^ m, and (iii) with proper nor- 
malization the set {/»(£)} forms an orthonormal basis for functions bandlimited to (— loq, ojo) 
[216]. These functions are called prolate spheroidal wave functions. Note that while (2.6.6) 
seems to depend on both T and uio, the solution depends only on the product T ■ luo- 

2.6.3 Short-Time Fourier Transform 

To achieve a "local" Fourier transform, one can define a windowed Fourier trans- 
form. The signal is first multiplied by a window function w{t — r) and then the usual 
Fourier transform is taken. This results in a two-indexed transform, STFTf(uj,T), 
given by 

/oo 
w*(t-r) f(t)e- ]u;t dt. 
-oo 

That is, one measures the similarity between the signal and shifts and modulates 
of an elementary window, or 

STFT f (u,T) = (g u , T (t), f(t)), 

where 

Thus, each elementary function used in the expansion has the same time and fre- 
quency resolution, simply a different location in the time-frequency plane. It is 
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(b) 



(c) 



(d) 



Figure 2.12 The short-time Fourier and wavelet transforms, (a) Modulates 
and shifts of a Gaussian window used in the expansion, (b) Tiling of the time- 
frequency plane, (c) Shifts and scales of the prototype bandpass wavelet, (d) 
Tiling of the time-frequency plane. 



thus natural to discretize the STFT on a rectangular grid (mwo, wo). If the win- 
dow function is a lowpass filter with a cutoff frequency of LOb, or a bandwidth of 
2ujf„ then ujq is chosen smaller than 2u>6 and tq smaller than n /uj^ in order to get an 
adequate sampling. Typically, the STFT is actually oversampled. A more detailed 
discussion of the sampling of the STFT is given in Section 5.2, where the inversion 
formula is also given. A real-valued version of the STFT, using cosine modulation 
and an appropriate window, leads to orthonormal bases, which are discussed in 
Section 4.8. 

Examples of STFT basis functions and the tiling of the time-frequency plane 
are given in Figures 2.12(a) and (b). To achieve good time-frequency resolution, a 
Gaussian window (see (2.6.5)) can be used, as originally proposed by Gabor [102]. 
Thus, the STFT is often called Gabor transform as well. 

The spectrogram is the energy distribution associated with the STFT, that is, 



S(u>, 



\STFT(tv, 



(2.6.7) 
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Because the STFT can be thought of as a bank of filters with impulse responses 
9u),T{—t) — w{—t — t) e~°^ T , the spectrogram is the magnitude squared of the filter 
outputs. 

2.6.4 Wavelet Transform 

Instead of shifts and modulates of a prototype function, one can choose shifts and 
scales, and obtain a constant relative bandwidth analysis known as the wavelet 
transform. To achieve this, take a real bandpass filter with impulse response ip{t) 
and zero mean 

/oo 
ij,(t) dt = tf(0) = 0. 
-oo 

Then, define the continuous wavelet transform as 

CWT f (a,b) = -L [ tp* (t±\ f( t ) dt, (2.6.8) 

where a £ TZ + and b £ 1Z. That is, we measure the similarity between the signal 
fit) and shifts and scales of an elementary function, since 

CWT f (a,b) = (Va,6(t), /(*)), 

where 

, / n 1 , ft- 

Va,b{t) = —j= W 



and the factor \j \fa is used to conserve the norm. Now, the functions used in 
the expansion have changing time-frequency tiles because of the scaling. For small 
a (a < 1), ip a ,b(t) w ih be short and of high frequency, while for large a (a > 1), 
ipa,b{t) w ih be long and of low frequency. Thus, a natural discretization will use 
large time steps for large a, and conversely, choose fine time steps for small a. The 
discretization of (a, b) is then of the form (ag , ag • To), and leads to functions for the 
expansion as shown in Figure 2.12(c). The resulting tiling of the time-frequency 
plane is shown in Figure 2.12(d) (the case a = 2 is shown). Special choices for 
ip(t) and the discretization lead to orthonormal bases or wavelet series as studied 
in Chapter 4, while the overcomplete, continuous wavelet transform in (2.6.8) is 
discussed in Section 5.1. 

2.6.5 Block Transforms 

An easy way to obtain a time-frequency representation is to slice the signal into 
nonoverlapping adjacent blocks and expand each block independently. For example, 
this can be done using a window function on the signal which is the indicator 



84 CHAPTER 2 

function of the interval [nT, (n+ 1)T), periodizing each windowed signal with period 
T and applying an expansion such as the Fourier series on each periodized signal (see 
Section 4.1.2). Of course, the arbitrary segmentation at points nT creates artificial 
boundary problems. Yet, such transforms are used due to their simplicity. For 
example, in discrete time, block transforms such as the Karhunen-Loeve transform 
(see Section 7.1.1) and its approximations are quite popular. 

2.6.6 Wigner-Ville Distribution 

An alternative to linear expansions of signals are bilinear expansions, of which the 
Wigner-Ville distribution is the most well-known [53, 59, 135]. 

Bilinear or quadratic time-frequency representations are motivated by the idea 
of an "instantaneous power spectrum", of which the spectrogram (see (2.6.7)) is 
a possible example. In addition, the time-frequency distribution TFDf(uj,T) of 
a signal /(£) with Fourier transform F(u) should satisfy the following marginal 
properties: Its integral along r given u should equal |.F(cc>)| 2 , and its integral along 
ijj given r should equal \f{j)\ 2 . Also, time- frequency shift invariance is desirable, 
that is, if g(t) = f(t - r )e^ ', then 

TFD g (u, t) = TDF f (uj - w , r - r ). 

The Wigner-Ville distribution satisfies the above requirements, as well as several 
other desirable ones [135]. It is defined, for a signal /(£), as 

/oo 
/ (t + t/2) f* (t - t/2) e~ jojt alt. (2.6.9) 

-oo 

A related distribution is the ambiguity function [216], which is dual to (2.6.9) 
through a two-dimensional Fourier transform. 

The attractive feature of time-frequency distributions such as the Wigner-Ville 
distribution above is the possible improved time-frequency resolution. For signals 
with a single time- frequency component (such as a linear chirp signal), the Wigner- 
Ville distribution gives a very clear and concentrated energy ridge in the time- 
frequency plane. 

However, the increased resolution for single component signals comes at a price 
for multicomponent signals, with the appearance of cross terms or interferences. If 
there are N components in the signal, there will be TV signal terms and one cross 
term for each pair of components, that is, ( 2 ) or N(N — l)/2 cross terms. While 
these interferences can be smoothed, this smoothing will come at the price of some 
resolution loss. In any case, the interference patterns make it difficult to visually 
interpret quadratic time-frequency distributions of complex signals. 
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APPENDIX 2. A Bounded Linear Operators on Hilbert Spaces 

Definition 2.8 

An operator A which maps one Hilbert space H\ into another Hilbert space 
H2 (which may be the same) is called a linear operator if for all x, y in Hi 
and a in C 

(a) A(x + y) = Ax + Ay. 

(b) A(ax) = aAx. 

The norm of A, denoted by \\A\\, is given by 

||A|| = sup \\Ax\\. 

||x||=l 
A linear operator A : H\ — > H2 is called bounded if 

sup \\Ax\\ < 00. 

||x||<l 

An important property of bounded linear operators is that they are continuous, 
that is, if x n — > x then ^4x n — > ^4x. An example of a bounded operator is the 
multiplication operator in foi^), defined as 

Ar[n] = m[n] x[n], 
where m[n] G l^Z). Because 

||Ar|| = \ (m[nj) (^M) < max(m[n]) ||x|| , 

n 

the operator is bounded. A bounded linear operator A : H\ — > #2 is called invertible 
if there exists a bounded linear operator ^4 _1 : H2 — > i?i such that 

A - ^4x = x, for every a; in ifi, 
^4^4~ y = y, for every y in H2. 

The operator ^4 _1 is called the inverse of A An important result is the following: 
Suppose A is a bounded linear operator mapping H onto itself, and \\A\\ < 1. Then 
I — A is invertible, and for every y in H, 



{I-A)- l y = f>V (2-A.l) 



k=0 
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Note that although the above expansion has the same form for a scalar as well 
as an operator, one should not forget the distinction between the two. Another 
important notion is that of an adjoint operator. 15 It can be shown that for every x 
in Hi and y in H2, there exists a unique y* from H\, such that 

(Ax,y) H2 = (x,y*) Hl = (x,A*y) Hl . (2.A.2) 

The operator A* : H 2 — > H\ defined by A*y = y*, is the adjoint of A. Note that A* 
is also linear and bounded, and that \\A\\ = \\A*\\. If H2 = Hi and A = A* , then A 
is called a self-adjoint or hermitian operator. 

Finally, an important type of operators are projection operators. Given a closed 
subspace S of a Hilbert space E, an operator P is called an orthogonal projection 
onto S if 

P(v + w) = v for all v € £ and w G 5 . 

It can be shown that an operator is an orthogonal projection if and only if P 2 = P 
and P is self- adjoint. 

Let us now show how we can associate a possibly infinite matrix 16 with a given 
bounded linear operator on a Hilbert space. Given is a bounded linear operator A 
on a Hilbert space H with the orthonormal basis {a;j}. Then any x from H can be 
written asi = ^2 i (xi,x)xi, and 

AX = _/ y \^j; 3?/-A3?i; -A^Cj = / {^kjAXj/Xk' 
i k 

Similarly, writing y = ^2 i {xi,y)xi, we can write Ax = y as 

(a?i,-Axi) (aji,Ax 2 ) ...\ /(x 1 ,x)\ / (xi,y) 

(x 2 ,Axi) (x 2 ,Ax 2 ) ■■■]{ (x 2 ,x) = (x 2 ,y) 




or, in other words, the matrix {aij} corresponding to the operator A expressed 
with respect to the basis {xi} is defined by aij = (xi,Axj). 

APPENDIX 2.B Parametrization of Unitary Matrices 

Our aim in this appendix is to show two ways of factoring real, n x n, unitary 
matrices, namely using Givens rotations and Householder building blocks. We 
concentrate here on real, square matrices, since these are the ones we will be using 
in Chapter 3. The treatment here is fairly brisk; for a more detailed, yet succinct 
account of these two factorizations, see [308]. 



15 In the case of matrices, the adjoint is the hermitian transpose. 

16 To be consistent with our notation throughout the book, in this context, matrices will be 
denoted by capital bold letters, while vectors will be denoted by lower-case bold letters. 
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Figure 2.13 Unitary matrices, (a) Factorization of a real, unitary, n x n 
matrix, (b) The structure of the block JJ i. 



2.B.1 Givens Rotations 

Recall that a real, n x n, unitary matrix [/satisfies (2.3.6). We want to show 
that such a matrix can be factored as in Figure 2.13, where each cross in part (b) 
represents a Givens (planar) rotation 



G r 



cos a — sin a 
sin a cos a 



(2.B.1) 



The way to demonstrate this is to show that any real, unitary n x n matrix U n can 
be expressed as 



U n — Rn-2 •Re 



U 





±1 



n-1 



(2.B.2) 
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where U n -\ is an (n — 1) x (n — 1), real, unitary matrix, and Ri is of the following 
form: 

/ 1 ... ... \ 



Ri 



J 

that is, we have a planar rotation in rows (i — 1) and n. By repeating the process 
on the matrix U n -\, we obtain the factorization as in Figure 2.13. The proof that 
any real, unitary matrix can be written as in (2.B.2) can be found in [308]. Note 
that the number of free variables (angles in Givens rotations) is n{n — l)/2. 

2.B.2 Householder Building Blocks 

A unitary matrix can be factored in terms of Householder building blocks, where 
each block has the form I — 2 • uu T , and it is a unitary vector. Thus, an n x n 
unitary matrix U can be written as 



U 



r~cH 



1 • • • 



H n -\ ■ D, 



(2.B.3) 



where D is diagonal with da = e 1 *, and Hi are Householder blocks J — 2ui uf . 

The fact that we mention the Householder factorization here is because we will 
use its polynomial version to factor lossless matrices in Chapter 3. 

Note that the Householder building block is unitary, and that the factorization 
in (2.B.3) can be proved similarly to the factorization using Givens rotations. That 
is, we can first show that 



1 



H t U 



pj"o 











where U\ is an (n— 1) x (n— 1) unitary matrix. Repeating the process on U\, U2, ■ 
we finally obtain 



1 



,H 



n-l ■ ■ ■ -"1 



HiU 



D, 



but since Hi = H i , we obtain (2.B.3). 
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APPENDIX 2.C Convergence and Regularity of Functions 

In Section 2.4.3, when discussing Fourier series, we pointed out possible convergence 
problems such as the Gibbs phenomenon. In this appendix, we first review different 
types of convergence and then discuss briefly some convergence properties of Fourier 
series and transforms. Then, we discuss regularity of functions and the associated 
decay of the Fourier series and transforms. More details on these topics can be 
found for example in [46, 326]. 



2.C.1 Convergence 

Pointwise Convergence Given an infinite sequence of functions {f n }%Li, we sav 
that it converges pointwise to a limit function / = linin^oo /„ if for each value of t 
we have 

lim f n (t) = f(t). 

This is a relatively weak form of convergence, since certain properties of f n (t), such 
as continuity, are not passed on to the limit. Consider the truncated Fourier series, 
that is (from (2.4.13)) 

n 

ut) = J2 F i k ] eJkWot - ( 2 - c -!) 

k=— n 

This Fourier series converges pointwise for all t when F[k] are the Fourier coefficients 
(see (2.4.14)) of a piecewise smooth 17 function f(t). Note that while each f n (i) is 
continuous, the limit need not be. 



Uniform Convergence An infinite sequence of functions {fn}^Li converges uni- 
formly to a limit f(t) on a closed interval [a, b] if (i) the sequence converges pointwise 
on [a, b] and (ii) given any e > 0, there exists an integer N such that for n > N, 
f n (t) satisfies \f(t) — f n (t)\ < e for all t in [a, b]. 

Uniform convergence is obviously stronger than pointwise convergence. For 
example, uniform convergence of the truncated Fourier series (2.C.1) implies con- 
tinuity of the limit, and conversely, continuous piecewise smooth functions have 
uniformly convergent Fourier series [326]. An example of pointwise convergence 
without uniform convergence is the Fourier series of piecewise smooth but discon- 
tinuous functions and the associated Gibbs phenomenon around discontinuities. 



17 A piecewise smooth function on an interval is piecewise continuous (finite number of disconti- 
nuities) and its derivative is also piecewise continuous. 
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Mean Square Convergence An infinite sequence of functions {/ n }^Li converges 
in the mean square sense to a limit f(t) if 

lim \\f-f n h = 0. 

Note that this does not mean that linijj^oo f n = f for all t, but only almost ev- 
erywhere. For example, the truncated Fourier series (2.C.1) of a piecewise smooth 
function converges in the mean square sense to f(t) when F[k] are the Fourier se- 
ries coefficients of /(£), even though at a point of discontinuity to, /(to) might be 
different from linin^oo f n (to) which equals the mean of the right and left limits. 

In the case of the Fourier transform, the concept analogous to the truncated 
Fourier series (2.C.1) is the truncated integral defined from the Fourier inversion 
formula (2.4.2) as 

fc(t) = ^ f *V) ^ &> 



where F(u) is the Fourier transform of fit) (see (2.4.1)). The convergence of the 
above integral as c — > oo is an important question, since the limit linic^oo f c {t) 
might not equal f{t). Under suitable restrictions on /(£), equality will hold. As an 
example, if f(t) is piecewise smooth and absolutely integrable, then linie^oo / c (^o) — 
/(to) at each point of continuity and is equal to the mean of the left and right limits 
at discontinuity points [326]. 

2.C.2 Regularity 

So far, we have mostly discussed functions satisfying some integral conditions (abso- 
lutely or square- integrable functions for example). Instead, regularity is concerned 
with differentiability. The space of continuous functions is called C°, and similarly, 
C n is the space of functions having n continuous derivatives. 

A finer analysis is obtained using Lipschitz (or Holder) exponents. A function 
/ is called Lipschitz of order a, < a < 1, if for any t and some small e, we have 

\f(t)-f(t + e)\ < c\e\ a . (2.C.2) 

Higher orders r = n + a can be obtained by replacing / with its nth derivative. 
This defines Holder spaces of order r. Note that condition (2.C.2) for a = 1 is 
weaker than differentiability. For example, the triangle function or linear spline 
f(t) = 1 — |i|, t € [0,1], and otherwise is Lipschitz of order 1 but only C°. 

How does regularity manifest itself in the Fourier domain? Since differentiation 
amounts to a multiplication by (jui) in Fourier domain (see (2.4.6)), existence of 
derivatives is related to sufficient decay of the Fourier spectrum. 
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It can be shown (see [216]) that if a function f(t) and all its derivatives up 
to order n exist and are of bounded variation, then the Fourier transform can be 
bounded by 

F (") < -, — rr^r, (2-C.3) 

V > - 1+ | w |n+l' V I 

that is, it decays as 0(l/|a;| n+1 ) for large u. Conversely, if F(oj) has a decay as in 
(2.C.3), then fit) has n — 1 continuous derivatives, and the nth derivative exists but 
might be discontinuous. A finer analysis of regularity and associated localization in 
Fourier domain can be found in [241], in particular for functions in Holder spaces 
and using different norms in Fourier domain. 
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Problems 

2.1 Legendre polynomials: Consider the interval [—1, 1] and the vectors 1, t, t 2 , t 3 , . . .. Using 
Gram-Schmidt orthogonalization, find an equivalent orthonormal set. 

2.2 Prove Theorem 2.4, parts (a), (b), (d), (e), for finite-dimensional Hubert spaces, 1Z n or C n . 

2.3 Orthogonal transforms and l^ norm: Orthogonal transforms conserve the li norm, but not 
others, in general. The 1^ norm of a vector is defined as (assume v £ R n ): 

loo[v] — max \vi\. 

i=0,...,n-l 



(a) Consider n — 2 and the set of real orthogonal transforms T2, that is, plane rotations. 
Given the set of vectors v with unit I2 norm (that is, vectors on the unit circle), give 
lower and upper bounds such that 

«2 < ^00 p2 • v] < b 2 . 

(b) Give the lower and upper bounds for the general case n > 2, that is, a n and b n . 

2.4 Norm of operators: Consider operators that map li(Z) to itself, and indicate their norm, 
or bounds on their norm. 

(a) (Aa))[n] = m[n] ■ x[n], m[n] = e J °", n (E Z. 

(b) (Ax)[2n] = x[2n] + x[2n + 1], (Ax)[2n + 1] = x[2n] - x[2n + l],n G Z. 

2.5 Assume a finite-dimensional space 1Z and an orthonormal basis {x\,X2, ■ ■ ■ , xn}- Any 
vector y can thus be written as y — ^ctiXi where on — (xi,y). Consider the best 
approximation to y in the least-squares sense and living on the subspace spanned by the 
first K vectors, {xi,X2, ■ ■ ■ ,xk}, or y — ^2- =1 (3iXi. Prove that fli — on for i — 1, . . . , K, 
by showing that it minimizes \\y — y\\. Hint: Use Parseval's equality. 

2.6 Least-squares solution: Show that for the least-squares solution obtained in Section 2.3.2, 
the partial derivatives d(\y — y\ 2 )/dxi are all zero. 

2.7 Least-squares solution to a linear system of equations: The general solution was given in 
Equation (2.3.4-2.3.5). 

(a) Show that if y belongs to the column space of A, then y — y. 

(b) Show that if y is orthogonal to the column space of A, then y — 0. 

2.8 Parseval's formulas can be proven by using orthogonality and biorthogonality relations of 
the basis vectors. 

(a) Show relations (2.2.5-2.2.6) using the orthogonality of the basis vectors. 

(b) Show relations (2.2.11-2.2.13) using the biorthogonality of the basis vectors. 
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2.9 Consider the space of square-integrable real functions on the interval [— n,n], L2([—n,n]), 
and the associated orthonormal basis given by 

1 cos nx sin nx 



2ir V 71 " V" 

Consider the following two subspaces: S - space of symmetric functions, that is, f(x) — 
f(—x), on [— 7r,7r], and A - space of antisymmetric functions, f(x) — — /(— x), on [— ir,n]. 

(a) Show how any function f(x) from L2Q— 7T,7r]) can be written as f(x) — f 3 (x) + f a (x), 
where f s (x) £ S and f a {x) £ A. 

(b) Give orthonormal bases for S and A. 
(C) Verify that L 2 ([-n, n]) = S © A. 

2.10 Downsampling by N: Prove (2.5.13) by going back to the underlying time-domain signal 
and resampling it with an iV-times longer sampling period. That is, consider x[n] and 
y[n] = rcfn./V] as two sampled versions of the same continuous- time signal, with sampling 
periods T and NT, respectively. Hint: Recall that the discrete-time Fourier transform 
X(e ju ) oix[n] is (see (2.4.36)) 

k = — oo 

where T is the sampling period. Then y(e J ") = Xnt{u/NT) (since the sampling period 
is now NT), where Xnt(u/NT) can be written similarly to the above equation. Finally, 
split the sum involved in Xnt{u/NT) into k — nN + I, and gathering terms, (2.5.13) will 
follow. 

2.11 Downsampling and aliasing: If an arbitrary discrete-time sequence x[n] is input to a filter 
followed by downsampling by 2, we know that an ideal half-band lowpass filter (that is, 
\H (e juJ )\ = 1, |w| < 7I-/2, and if(e JC ") = 0, tt/2 < |w| < vr) will avoid aliasing. 

(a) Show that H 1 (e^) — H(e? 2 ^) will also avoid aliasing. 

(b) Same for H"(e ju ) = H(e ]{2 "- 7 ' ) ). 

(c) A two-channel system using if(e JtJ ) and H(e^^~ 1T >) followed by downsampling by 
2 will keep all parts of the input spectrum untouched in either channel (except at 
lo = 7r/2). Show that this is also true if H '(e-'") and H" (e?^) are used instead. 

2.12 In pattern recognition, it is sometimes useful to expand a signal using the desired pattern, 
or template, and its shifts, as basis functions. For simplicity, consider a signal of length N, 
x[n], n — 0, . . . , N — 1, and a pattern p[n], n — 0, . . . , N — 1. Then, choose as basis functions 

<Pk[n] - p[(n - k) mod N], k-0,...,N-l, 

that is, circular shifts of p[n]. 

(a) Derive a simple condition on p[n], so that any x[n] can be written as a linear combi- 
nation of {ifik}- 
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(b) Assuming the previous condition is met, give the coefficients at of the expansion 

JV-l 

x[n] — 22 ak V fc N- 



2.13 Show that a linear, periodically time- varying system of period N can be implemented with 
a polyphase transform followed by upsampling by N, N filter operations and a summation. 

2.14 Interpolation of oversampled signals: Assume a function f(t) bandlimited to uj m — n. If 
the sampling frequency is chosen at the Nyquist rate, u) s — 2n, the interpolation filter is 
the usual sine filter with slow decay (~ 1/i). If f(t) is oversampled, for example, with 
Ds — 3n, then filters with faster decay can be used for interpolating /(i) from its samples. 
Such filters are obtained by convolving (in frequency) elementary rectangular filters (two 
for H2(co), three for Hz(lu), while Hi(cu) would be the usual sine filter). 

(a) Give the expression for /»2(i), and verify that it decays as 1/i 2 . 

(b) Same for hs(t), which decays as 1/i 3 . Show that H3(u>) has a continuous derivative. 

(c) By generalizing the construction above of H2(co) and Hz(uj), show that one can obtain 
hi{i) with decay l/i\ Also, show that Hi(u>) has a continuous (i — 2)th derivative. 
However, the filters involved become spread out in time, and the result is only inter- 
esting asymptotically. 

2.15 Uncertainty relation: Consider the uncertainty relation A^ A t > 7r/2. 

(a) Show that scaling does not change A 2 • A 2 . Either use scaling that conserves the L2 
norm (/'(i) = y/af(at)) or be sure to renormalize A 2 , A 2 . 

(b) Can you give the time-bandwidth product of a rectangular pulse, p(t) — 1, —1/2 < 
i < 1/2, and otherwise? 

(c) Same as above, but for a triangular pulse. 

(d) What can you say about the time-bandwidth product as the time-domain function is 
obtained from convolving more and more rectangular pulse with themselves? 

2.16 Consider allpass filters where 

H (2) = TT^I±£Cl 
v ' LL 1 + a i z~ 1 

(a) Assume the filter has real coefficients. Show pole-zero locations, and that numerator 
and denominator polynomials are mirrors of each other. 

(b) Given h[n], the causal, real-coefficient impulse response of a stable allpass filter, give 
its autocorrelation a[k] — J^ n /i[n]/i[n — fe]. Show that the set {h[n — k]}, k £ Z, is an 
orthonormal basis for h(Z). Hint: Use Theorem 2.4. 

(c) Show that the set {h[n — 2k]} is an orthonormal set but not a basis for h{Z). 

2.17 Parseval's relation for nonorthogonal bases: Consider the space V — lZ n and a biorthogonal 
basis, that is, two sets {en} and {Pi} such that 



(a u Pi) = 5[i-j] i,j = 0,. 
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(a) Show that any vector v £ V can be written in the following two ways: 

ra-1 n-1 



(b) Call v a the vector with entries {cti,v) and similarly vp with entries (fli,v). Given ||v||, 
what can you say about ||w Q || and H^H? 

(c) Show that the generalization of Parseval's identity to biorthogonal systems is 



and 



(v,v) = (Va,Vf3) 



(v,g) = {v a , gi3 ). 



2.18 Circulant matrices: An N x N circulant matrix C is defined by its first line, since subse- 
quent lines are obtained by a right circular shift. Denote the first line by {co,cjv-i, . . . ,Ci} 
so that C corresponds to a circular convolution with a filter having impulse response 

{Co,Cl,C2, • • • ,CjV-l}. 

(a) Give a simple test for the singularity of C . 

(b) Give a formula for det(C). 

(c) Prove that C -1 is circulant. 

(d) Show that C\ C2 = C2 C\ and that the result is circulant. 

2.19 Walsh basis: To define the Walsh basis, we need the Kronecker product of matrices defined 
in (2.3.2). Then, the matrix W k , of size 2 k x 2 k , is 



W k 



1 1 
1 -1 



WW, W = [l], Wi 



1 1 
1 -1 



(a) Give 1^2,^3 and W \ (last one only partially). 

(b) Show that W k is orthonormal (within a scale factor you should indicate). 

(c) Create a block matrix T 



W Q 



l/V2Wi 



1/2W 2 



l/2 3 / 2 Wa 



and show that T is unitary. Sketch the upper left corner of T. 

(d) Consider the rows of T as basis functions in an orthonormal expansion of l2(Z + ) 
(right-sided sequences). Sketch the tiling of the time- frequency plane achieved by this 
expansion. 
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Discrete-Time Bases and Filter Banks 



'What is more beautiful than the Quincunx, 
which, from whatever direction you look, 

is correct?" 
- Quintilian 



o, 



ur focus in this chapter will be directed to series expansions of discrete-time 
sequences. The reasons for expanding signals, discussed in Chapter 1, are linked 
to signal analysis, approximation and compression, as well as algorithms and im- 
plementations. Thus, given an arbitrary sequence x[n], we would like to write it 
as 

X VA = yZ{Vk,x) <pk[n], neZ. 

Therefore, we would like to construct orthonormal sets of basis functions, {</?fc[n]}, 
which are complete in the space of square-summable sequences, h(2). More general, 
biorthogonal and overcomplete sets, will be considered as well. 

The discrete-time Fourier series, seen in Chapter 2, is an example of such an 
orthogonal series expansion, but it has a number of shortcomings. Discrete-time 
bases better suited for signal processing tasks will try to satisfy two conflicting 
requirements, namely to achieve good frequency resolution while keeping good time 
locality as well. Additionally, for both practical and computational reasons, the set 
of basis functions has to be structured. Typically, the infinite set of basis functions 
{ifik} is obtained from a finite number of prototype sequences and their shifted 
versions in time. This leads to discrete-time filter banks for the implementation of 

97 
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such structured expansions. This filter bank point of view has been central to the 
developments in the digital signal processing community, and to the design of good 
basis functions or filters in particular. While the expansion is not time-invariant, 
it will at least be periodically time-invariant. Also, the expansions will often have 
a successive approximation property. This means that a reconstruction based on 
an appropriate subset of the basis functions leads to a good approximation of the 
signal, which is an important feature for applications such as signal compression. 

Linear signal expansions have been used in digital signal processing since at 
least the 1960's, mainly as block transforms, such as piecewise Fourier series and 
Karhunen-Loeve transforms [143]. They have also been used as overcomplete ex- 
pansions, such as the short-time Fourier transform (STFT) for signal analysis and 
synthesis [8, 226] and in transmultiplexers [25]. Increased interest in the subject, 
especially in orthogonal and biorthogonal bases, arose with work on compression, 
where redundancy of the expansion such as in the STFT is avoided. In particular, 
subband coding of speech [68, 69] spurred a detailed study of critically sampled 
filter banks. The discovery of quadrature mirror filters (QMF) by Croisier, Esteban 
and Galand in 1976 [69], which allows a signal to be split into two downsampled 
subband signals and then reconstructed without aliasing (spectral foldbacks) even 
though nonideal filters are used, was a key step forward. 

Perfect reconstruction filter banks, that is, subband decompositions, where the 
signal is a perfect replica of the input, followed soon. The first orthogonal solution 
was discovered by Smith and Barnwell [270, 271] and Mintzer [196] for the two- 
channel case. Fettweiss and coworkers [98] gave an orthogonal solution related 
to wave digital filters [97]. Vaidyanathan, who established the relation between 
these results and certain unitary operators (paraunitary matrices of polynomials) 
studied in circuit theory [23], gave more general orthogonal solutions [305, 306] 
as well as lattice factorizations for orthogonal filter banks [308, 310]. Biorthogonal 
solutions were given by Vetterli [315], as well as multidimensional quadrature mirror 
filters [314]. Biorthogonal filter banks, in particular with linear phase filters, were 
investigated in [208, 321] and multidimensional filter banks were further studied in 
[155, 163, 257, 264, 325]. Recent work includes filter banks with rational sampling 
factors [166, 206] and filter banks with block sampling [158]. Additional work on 
the design of filter banks has been done in [144, 205] among others. 

In parallel to this work on filter banks, a generalization of block transforms 
called lapped orthogonal transforms (LOT's) was derived by Cassereau [43] and 
Malvar [186, 188, 189]. An attractive feature of a subclass of LOT's is the existence 
of fast algorithms for their implementation since they are modulated filter banks 
(similar to a "real" STFT). The connection of LOT's with filter banks was shown, 
in [321]. 
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Another development, which happened independently of filter banks but turns 
out to be closely related, is the pyramid decomposition of Burt and Adelson [41]. 
While it is oversampled (overcomplete) , it clearly uses multiresolution concepts, by 
decomposing a signal into a coarse approximation plus added details. This frame- 
work is central to wavelet decompositions and establishes conceptually the link be- 
tween filter banks and wavelets, as shown by Mallat [179, 180, 181] and Daubechies 
[71, 73]. This connection has led to a renewed interest in filter banks, especially 
with the work of Daubechies who first constructed wavelets from filter banks [71] 
and Mallat who showed that a wavelet series expansion could be implemented with 
filter banks [181]. Recent work on this topic includes [117, 240, 319]. 

As can be seen from the above short historical discussion, there are two different 
points of view on the subject, namely, expansion of signals in terms of structured 
bases, and perfect reconstruction filter banks. While the two are equivalent, the 
former is more in tune with Fourier and wavelet theory, while the latter is central 
to the construction of implementable systems. In what follows, we use both points 
of view, using whichever is more appropriate to explain the material. 

The outline of the chapter is as follows: First, we review discrete-time series 
expansions, and consider two cases in some detail, namely the Haar and the sine 
bases. They are two extreme cases of two-channel filter banks. The general two- 
channel filter bank is studied in detail in Section 3.2, where both the expansion and 
the more traditional filter bank point of view are given. The orthogonal case with 
finite-length basis functions or finite impulse response (FIR) filters is thoroughly 
studied. The biorthogonal FIR case, in particular with linear phase filters (sym- 
metric or antisymmetric basis functions), is considered, and the infinite impulse 
response (IIR) filter case (which corresponds to basis functions with exponential 
decay) is given as well. 

In Section 3.3, the study of filter banks with more than two channels starts 
with tree-structured filter banks. In particular, a constant relative bandwidth 
(or constant-Q) tree is shown to compute a discrete-time wavelet series. Such a 
transform has a multiresolution property that provides an important framework for 
wavelet transforms. More general filter bank trees, also known as wavelet packets, 
are presented as well. 

Filter banks with TV channels are treated next. The two particular cases of block 
transforms and lapped orthogonal transforms are discussed first, leading to the 
analysis of general iV-channel filter banks. An important case, namely modulated 
filter banks, is studied in detail, both because of its relation to short-time Fourier- 
like expansions, and because of its computational efficiency. 

Overcomplete discrete-time expansions are discussed in Section 3.5. The pyra- 
mid decomposition is studied, as well as the classic overlap-add/save algorithm for 
convolution computation which is a filter bank algorithm. 
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Multidimensional expansions and filter banks are derived in Section 3.6. Both 
separable and nonseparable systems are considered. In the nonseparable case, the 
focus is mostly on two-channel decompositions, while more general cases are indi- 
cated as well. 

Section 3.7 discusses a scheme that has received less attention in the filter bank 
literature, but is nonetheless very important in applications, and is called a trans- 
multiplexer. It is dual to the analysis/synthesis scheme used in compression appli- 
cations, and is used in telecommunications. 

The two appendices contain more details on orthogonal solutions and their fac- 
torizations as well as on multidimensional sampling. 

The material in this chapter covers filter banks at a level of detail which is 
adequate for the remainder of the book. For a more exhaustive treatment of filter 
banks, we refer the reader to the text by Vaidyanathan [308]. Discussions of fil- 
ter banks and multiresolution signal processing are also contained in the book by 
Akansu and Haddad [3]. 

3.1 Series Expansions of Discrete-Time Signals 

We start by recalling some general properties of discrete-time expansions. Then, we 
discuss a very simple structured expansion called the Haar expansion, and give its 
filter bank implementation. The dual of the Haar expansion — the sine expansion — 
is examined as well. These two examples are extreme cases of filter bank expansions 
and set the stage for solutions that lie in between. 

Discrete-time series expansions come in various flavors, which we briefly review 
(see also Sections 2.2.3-2.2.5). As usual, x[n] is an arbitrary square-summable 
sequence, or x[n] £ h(Z)- First, orthonormal expansions of signals x[n] from h(2) 
are of the form 

X N = y^(fk[l],x[l]} tpk[n] = y~]x[k] ipk[n], (3.1.1) 

k&Z k&Z 

where 

X[k] = (<p k [l],x[l]) = £>*[*] *[*], (3-1-2) 

l 

is the transform of x[n\. The basis functions ipk satisfy the orthonormality 1 con- 
straint 

(<p k [n],ipi[n\) = 8[k - 1} 



The first constraint is orthogonality between basis vectors. Then, normalization leads to 
orthonormality. The terms "orthogonal" and "orthonormal" will often be used interchangeably, 
unless we want to insist on the normalization and then use the latter. 
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and the set of basis functions is complete, so that every signal from li{£) can 
be expressed using (3.1.1). An important property of orthonormal expansions is 
conservation of energy, 

IMI 2 = ll^ll 2 - 

Biorthogonal expansions, on the other hand, are given as 

x[n] = ^2(tPk[l],x[l\) <pk[n] = y^X[k] <fik[n], (3.1.3) 

k€Z k&Z 

= ^2(<fk[l],x[l}} (pk[n] = ^2 X[k] (p k [n], 

k£Z k€Z 

where 

X[k] = (<p k [l],x[t\) and X[k] = (<p k [l],x[l]) 

are the transform coefficients of x[n] with respect to {(f>k\ and {'■fk}- The dual bases 
{Vfc} an d {<f>k} satisfy the biorthogonality constraint 

(tpk[n],ipi[n]) = S[k -I}. 

Note that in this case, conservation of energy does not hold. For stability of the 
expansion, the transform coefficients have to satisfy 



Aj2\X[k}\ 2 <\\x\\ 2 <Bj2\XW 



with a similar relation for the coefficients X[k]. In the biorthogonal case, conserva- 
tion of energy can be expressed as 

||x|| 2 = (X[k],X[k\). 

Finally, over complete expansions can be of the form (3.1.1) or (3.1.3), but with 
redundant sets of functions, that is, the functions <Pk[ n ] used in the expansions are 
not linearly independent. 

3.1 .1 Discrete-Time Fourier Series 

The discrete-time Fourier transform (see also Section 2.4.6) is given by 

x[n] = — f X(u) e JM dw (3.1.4) 

oo 

x(u) = J2 X N e ~ Sun - ( 3 - L5 ) 
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It is a series expansion of the 27r-periodic function X{oj) as given by (3.1.5), while 
x[n] is written in terms of an integral of the continuous-time function X(u). While 
this is an important tool in the analysis of discrete-time signals and systems [211], 
the fact that the synthesis of x[n] given by (3.1.4) involves integration rather than 
series expansion, makes it of limited practical use. An example of a series expansion 
is the discrete-time Fourier series 



x n 



N-l 

] = T7 E X W e j2 * knlN , (3.1.6) 



N 

fc=0 
N-l 



X[k] = E 



x\n] e - j2nkn / N , 



n=0 

where x[n] is either periodic (n g Z) or of finite length (n = 0, 1, . . . , N — 1). In 
the latter case, the above is often called the discrete Fourier transform (DFT). 

Because it only applies to such restricted types of signals, the Fourier series 
is somewhat limited in its applications. Since the basis functions are complex 
exponentials 

j_ e J2nkn/N n = Q, 1, . . . , JV - 1, 



'" ' - ' otherwise, 

for the finite-length case (or the periodic extension in the periodic case), there is no 
decay of the basis function over the length- TV window, that is, no time localization 
(note that \\<pk\\ = 1/vN in the above definition). 

In order to expand arbitrary sequences we can segment the signal, and obtain a 
piecewise Fourier series (one for each segment). Simply segment the sequence x[n] 
into subsequences x^'[n] such that 

x »r n i = | x \ n \ n = iN + l, 1 = 0,1,... ,N -1, i <EZ, 

\ otherwise, 

and take the discrete Fourier transform of each subsequence independently, 
iV-l 

X®[k] =^2x^[iN + l}e- j2wkl / N k = 0,1,...,N-1. (3.1.8) 

1=0 

Reconstruction of x[n] from X^Jfe] is obvious. Recover x'*^[n] by inverting (3.1.8) 
(see also (3.1.6)) and then get x[n] following (3.1.7) by juxtaposing the various 
j;W[n]. This leads to 

oo N-l 



; N= E E xW m^n, 



i=— oo fc=0 
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where 



rf>[n] 



1 e J2irkn/N 



N 







n = iN + 1, 
otherwise. 



1 = 0,1, 



.N-l, 



The (fit [n] are simply the basis functions of the DFT shifted to the appropriate 
interval [iN, ...,(i + 1)N - 1]. 

The above expansion is called a block discrete-time Fourier series, since the 
signal is divided into blocks of size N, which are then Fourier transformed. In 
matrix notation, the overall expansion of the transform is given by a block diagonal 
matrix, where each block is an N x N Fourier matrix Fn, 



{ : ^ 


I 


X (-1) 




X (0) 


= 


X (l) 




\ '■■ ) 


V 



■ N 



N 



N 



\ 


( \ 




a;*" 1 ) 




x (o) 




xW 


J 


^ i / 



and X^ 1 ', x^' are size-iV vectors. Up to a scale factor of 1/yN (see (3.1.6)), this is 
a unitary transform. This transform is not shift-invariant in general, that is, if x[n] 
has transform X[k], then x[n — I] does not necessarily have the transform X[k — I]. 
However, it can be seen that 



x n 



I N] 



X[k-l N]. 



(3.1.9) 



That is, the transform is periodically time-varying with period iV. 2 Note that we 
have achieved a certain time locality. Components of the signal that exist only in 
an interval [iN . . . (i + l)iV — 1] will only influence transform coefficients in the same 
interval. Finally, the basis functions in this block transform are naturally divided 
into size-iV subsets, with no overlaps between subsets, that is 



(<P 



W 



" 



( m ) r l \ 



o. 



i / m, 



simply because the supports of the basis functions are disjoint. This abrupt change 
between intervals, and the fact that the interval length and position are arbitrary, 
are the drawbacks of this block DTFS. 

In this chapter, we will extend the idea of block transforms in order to address 
these drawbacks, and this will be done using filter banks. But first, we turn our 
attention to the simplest block transform case, when N = 2. This is followed by 
the simplest filter bank case, when the filters are ideal sine filters. The general case, 
to which these are a prelude, lies between these extremes. 



2 Another way to say this is that the "shift by N" and the size-JV block transform operators 
commute. 
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3.1 .2 Haar Expansion of Discrete-Time Signals 

The Haar basis, while very simple, should nonetheless highlight key features such as 
periodic time variance and the relation with filter bank implementations. The basic 
unit is a two-point average and difference operation. While this is a 2 x 2 unitary 
transform that could be called a DFT just as well, we refer to it as the elementary 
Haar basis because we will see that its suitable iteration will lead to both the 
discrete-time Haar decomposition (in Section 3.3) as well as the continuous-time 
Haar wavelet (in Chapter 4). 

The basis functions in the Haar case are given by 



r i J -k n = 2fc, 2fc + l, 

otherwise, 



-4 n = 2k 



V2 

-^ n = 2k + l, (3.1.10) 
otherwise. 



It follows that the even-indexed basis functions are translates of each other, and so 
are the odd-indexed ones, or 

<P2k[n] = ipo[n-2k], </? 2 fc+i N = fi [n - 2k] . (3.1.11) 

The transform is 

X[2k] = {<f>2k,x) = -p (x[2k] + x[2k + 1]) , (3.1.12) 

V2 

X[2k + 1] = (<pzk +1 ,x) = 4= (x[2k] - x[2k + 1]) . (3.1.13) 

V2 

The reconstruction is obtained from 

x[n] = J2 X W Vk[n], (3.1.14) 

k&z 

as usual for an orthonormal basis. Let us prove that the set (pk\p\ given in (3.1.10) 
is an orthonormal basis for li{Z). While the proof is straightforward in this simple 
case, we indicate it for two reasons. First, it is easy to extend it to any block 
transform, and second, the method of the proof can be used in more general cases 
as well. 

Proposition 3.1 

The set of functions as given in (3.1.10) is an orthonormal basis for signals 
from li(Z'). 
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Proof 

To check that the set of basis functions {<fk}k€Z indeed constitutes an orthonormal basis 
for signals from h{2), we have to verify that: 

(a) {<fik}k€Z is an orthonormal family. 

(b) {(fk}k€Z is complete. 

Consider (a). We want to show that {(fk,<Pi) = S[k — I]. Take k even, k — 2i. Then, for I 
smaller than 1i or larger than 2i + 1, the inner product is automatically zero since the basis 
functions do not overlap. For I — 2i, we have 

(<P2i,¥>2,> = <flli[2i] + <pl t [2i + l] = - + - = 1. 

For I = 2i + 1, we get 

(ifi2i,^>2i+i} — <P2i[2i] ■ <p2i+i[2i] + ip2i[2i + l] ■ ifi2i+i[2i + 1] = 0. 

A similar argument can be followed for odd Us, and thus, orthonormality is proven. Now 
consider (b). We have to demonstrate that any signal belonging to h(Z) can be expanded 
using (3.1.14). This is equivalent to showing that there exists no x[n] with ||x|| > 0, such 
that it has a zero expansion, that is, such that ||(<£fc,ic}|| = 0, for all k. To prove this, 
suppose it is not true, that is, suppose that there exists an x[n] with ||x|| > 0, such that 
||(v?fc,x}|| =0, for all k. Thus 

||<^,x)|| = ^ \\{<Pk,x)\f = ^^ £)KVk[n],a:[n]>| a = °' (3 ' L15) 

k£Z 

Since the last sum consists of strictly nonnegative terms, (3.1.15) is possible if and only if 

X[k] — {ifk[n],x[n]) = 0, for all k. 

First, take k even, and consider X[2fc] = 0. Because of (3.1.12), it means that x[2k] = 
-x[2k + 1] for all k. Now take the odd fc's, and look at X[2k + 1] = 0. From (3.1.13), it 
follows that x[2k] = a;[2fc + l] for all k. Thus, the only solution to the above two requirements 
is x[2k] — x[2k + 1] = 0, or a contradiction with our assumption. This shows that there is 
no sequence x[n], \\x\\ > such that ||X|| = 0, and proves completeness. 

Now, we would like to show how the expansion (3.1.12-3.1.14) can be implemented 
using convolutions, thus leading to filter banks. Consider the filter ho[n] with the 
following impulse response: 

Hn] = [ v? H = "V °' (3.1.16) 

otherwise. 

Note that this is a noncausal filter. Then, X[2A;] in (3.1.12) is the result of the 
convolution of ho[n] with x[n] at instant 2k since 

h [n]*x[n]\ n=2k = J2 h °Pk - 1} x[l] = -±=x[2k] + -±=x[2k + 1] = X[2k}. 
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(a) 



analysis 



synthesis 




(b) 




D D 



Figure 3.1 Two-channel filter bank with analysis filters /lo[n.], hi [n] and synthe- 
sis filters <?oM> 9i [ n \- If the filter bank implements an orthonormal transform, 
then go[n] = ho[— n] and g\ [n] = h\[— n]. (a) Block diagram, (b) Spectrum 
splitting performed by the filter bank. 



Similarly, by defining the filter h\ [n] with the impulse response 

0, 



hi[n] 



i 
± 

o 



n 

n = — 1, 

otherwise, 



(3.1.17) 



we obtain that X[2k + 1] in (3.1.13) follows from 



hi[n] * x[n] 



=2A; 






x[l] 



-=x[2k] ^x[2k + 1] 

\/2 v 7 ^ 



X[2k + 1]. 



We recall (from Section 2.5.3) that evaluating a convolution at even indexes corre- 
sponds to a filter followed by downsampling by 2. Therefore, X[2A;] and X[2k + 1] 
can be obtained from a two-channel filter bank, with filters ho[n] and /ii[n], followed 
by downsampling by 2, as shown in the left half of Figure 3.1(a). This is called an 
analysis filter bank. Often, we will specifically label the channel signals as j/o an d 
j/i, where 

yo[k] = X[2k], yi [k] = X[2k + l\. 
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It is important to note that the impulse responses of the analysis filters are time- 
reversed versions of the basis functions, 



hn\n] 



<Po[-n\ 



hi\n] 



<Pi[-n\ 



since convolution is an inner product involving time reversal. Also, the filters we 
defined in (3.1.16) and (3.1.17) are noncausal, which is to be expected since, for 
example, the computation of X[2fe] in (3.1.12) involves x[2k + 1], that is, a future 
sample. To summarize this discussion, it is easiest to visualize the analysis in matrix 
notation as 



/■-. 



/ ; \ 

2/0 [0] 
2/1 [0] 
J/o[l] 
2/1 [1] 

V ; J 



( \ \ 

X[0] 
X[l] 
X[2] 

X[3] 

V ; J 



\ 



<po[n] 



h [0] ho[-l] 

Mo] M-i] 



viH 



ip2[n] 



h [0] ho[-l] 
hi[0] /n[-l] 

ip 3 [n] 



( \ \ 

x[0] 
x[l] 
x[2] 

x[3] 

V ; / 



(3.1.18) 
where we again see the shift property of the basis functions (see (3.1.11)). We can 
verify the shift invariance of the analysis with respect to even shifts. If x'[n] = 
x[n — 21], then 

X'[2k] = -^=(x'[2k]+x'[2k + l]) = -^={x[2k - 21] + x[2k + 1 - 21]) 
v2 v2 

= X[2k-2l] 

and similarly for X'[2k + 1] which equals X[2k + 1 — 21], thus verifying (3.1.9). 
This does not hold for odd shifts, however. For example, S[n] has the transform 
(S[n] + S[n - l])/\/2 while S[n - 1] leads to (5[n] - S[n - l])/\/2. 

What about the synthesis or reconstruction given by (3.1.14)? Define two filters 
go and g\ with impulse responses equal to the basis functions ipo and <pi 



9o[n\ 



ipo[n\ 



Therefore 



<P2k[n\ 



go[n- 2k], 



ffi [nj 



f2k+i[n\ 



ipi[n\ 



gi[n- 2k], 



(3.1.19) 



(3.1.20) 
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following (3.1.11). Then (3.1.14) becomes, using (3.1.19) and (3.1.20), 

= ^2 yo[k]V2k[n\ + ^2 yi[k](p2k+i[n\ (3.1.21) 



x \n\ 



= X>o[%>[n-2fc] + ^ yi [k] 9l [n - 2k] . (3.1.22) 

k€Z k&Z 

That is, each sample from yi[k] adds a copy of the impulse response of gi[n] shifted by 
2k. This can be implemented by an upsampling by 2 (inserting a zero between every 
two samples of yi[k]) followed by a convolution with gi[n] (see also Section 2.5.3). 
This is shown in the right side of Figure 3.1(a), and is called a synthesis filter bank. 
What we have just explained is a way of implementing a structured orthogonal 
expansion by means of filter banks. We summarize two characteristics of the filters 
which will hold in general orthogonal cases as well. 

(a) The impulse responses of the synthesis filters equal the first set of basis func- 
tions 

gi[n] = tfi[n], i = 0, 1. 

(b) The impulse responses of the analysis filters are the time-reversed versions of 
the synthesis ones 

hi[n] = gi[-n], i = 0, 1. 

What about the signal processing properties of our decomposition? From (3.1.12) 
and (3.1.13), we recall that one channel computes the average and the other the 
difference of two successive samples. While these are not the "best possible" low- 
pass and highpass filters (they have, however, good time localization), they lead to 
an important interpretation. The reconstruction from yo[k] (that is, the first sum 
in (3.1.21)) is the orthogonal projection of the input onto the subspace spanned by 
( / 3 2fc[ n ]) that is, an average or coarse version of x[n\. Calling it xq, it equals 

x [2k] = x [2k + l] = - (x[2k] + x[2k + 1}) . 

The other sum in (3.1.21), which is the reconstruction from yi[k], is the orthogonal 
projection onto the subspace spanned by ¥>2fc+i M • Denoting it by x±, it is given by 

Xl [2k] = - (x[2k] - x[2k + 1]) , xi[2fc + l] = -x ± [2k]. 

This is the difference or added detail necessary to reconstruct x[n] from its coarse 
version xo[n]. The two subspaces spanned by {^>2k} an d {</?2fc+i} are orthogonal 
and the sum of the two projections recovers x[n] perfectly, since summing (#o[2fc] + 
xi[2fc]) yields x[2k] and similarly (xo[2k + 1] + x±[2k + 1]) gives x[2k + 1]. 
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3.1 .3 Sine Expansion of Discrete-Time Signals 

Although remarkably simple, the Haar basis suffers from an important drawback 
- the frequency resolution of its basis functions (filters), is not very good. We 
now look at a basis which uses ideal half-band lowpass and highpass filters. The 
frequency selectivity is ideal (out-of-band signals are perfectly rejected), but the 
time localization suffers (the filter impulse response is infinite, and decays only 
proportionally to 1/n). 

Let us start with an ideal half-band lowpass filter go[ n ]> defined by its 2tt- 
periodic discrete-time Fourier transform Go(e^) = y/2, lo £ [— 7r/2, 7r/2] and for 
uj £ [tt/2, 37r/2]. The scale factor is so chosen that ||Go|| — 27T or ||<7o|| — 1 following 
Parseval's relation for the DTFT. The inverse DTFT yields 

V2 r/ 2 ,. „ , 1 sin7rn/2 

9o[n] = 7T / e^ n cLv = -= J-. 3.1.23) 

2tt J^ /2 V2 vrn/2 

Note that <7o [2n] = l/\/2 • 8[n]. As the highpass filter, choose a modulated version 
of 5o[ n ]) with a twist, namely a time reversal and a shift by one 

9l [n] = (-l) n g [-n+l]. (3.1.24) 

While the time reversal is only formal here (since go[n] is symmetric in n), the 
shift by one is important for the completeness of the highpass and lowpass impulse 
responses in the space of square-summable sequences. 

Just as in the Haar case, the basis functions are obtained from the filter impulse 
responses and their even shifts, 

V>2k[n] = g [n-2k], (p 2 k+i[n\ = g 1 [n-2k], (3.1.25) 

and the coefficients of the expansion {(p2k, x ) an d (¥>2fc+i> x ) are obtained by filtering 
with ho[n] and h\[n\ followed by downsampling by 2, with hi[n] = gi\— n]. 

Proposition 3.2 

The set of functions as given in (3.1.25) is an orthonormal basis for signals 
from 12(2). 



Proof 



To prove that the set of functions Lpu [n] is indeed an orthonormal basis, again we would 
have to demonstrate orthonormality of the set as well as completeness. Let us demonstrate 
orthonormality of basis functions. We will do that only for 

<V2*[n],Vai[n]> = S[k-l], (3.1.26) 
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and leave the other two cases 

(<P2k[n],<P2i+iM) = 0, (3.1.27) 

(< / ? 2fe+1 [n],¥> 2 ; + i[n]> = 6[k - 1], (3.1.28) 

as an exercise (Problem 3.1). First, because <y?2fc[n] = <po[n — 2k], it suffices to show (3.1.26) 
for k = 0, or equivalently, to prove that 

(go[n] , g [n-Zt\)=6[t\. 

From (2.5.19) this is equivalent to showing 

|C?o(e^)| 2 + |G (e'' 

which holds true since Go(e j ") — \[2 between — 7r/2 and n/2. The proof of the other 
orthogonality relations is similar. 

The proof of completeness, which can be made along the lines of the proof in Propo- 
sition 3.1, is left to the reader (see Problem 3.1). 

As we said, the filters in this case have perfect frequency resolution. However, 
the decay of the filters in time is rather poor, being of the order of 1/n. The 
multiresolution interpretation we gave for the Haar case holds here as well. The 
perfect lowpass filter ho, followed by downsampling, upsampling and interpolation 
by go, leads to a projection of the signal onto the subspace of sequences bandlimited 
to [— 7r/2, 7t/2], given by xq. Similarly, the other path in Figure 3.1 leads to a 
projection onto the subspace of half-band highpass signals given by x\. The two 
subspaces are orthogonal and their sum is li{£). It is also clear that x$ is a coarse, 
lowpass approximation to x, while x\ contains the additional frequencies necessary 
to reconstruct x from xq. 

An example describing the decomposition of a signal into downsampled lowpass 
and highpass components, with subsequent reconstruction using upsampling and 
interpolation, is shown in Figure 3.2. Ideal half-band filters are assumed. The 
reader is encouraged to verify this spectral decomposition using the downsampling 
and upsampling formulas (see (2.5.13) and (2.5.17)) from Section 2.5.3. 

3.1.4 Discussion 

In both the Haar and sine cases above, we noticed that the expansion was not 
time-invariant, but periodically time- varying. We show below that time invariance 
in orthonormal expansions leads only to trivial solutions, and thus, any meaningful 
orthonormal expansion of li(£) will be time- varying. 

Proposition 3.3 

An orthonormal time-invariant signal decomposition will have no frequency 
resolution. 
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^ — — ^^ i ^~^ — i — »- 



|X(eJ[ 



^r^ 



Figure 3.2 Two-channel decomposition of a signal using ideal niters. Left side 
depicts the process in the lowpass channel, while the right side depicts the 
process in the highpass channel, (a) Original spectrum, (b) Spectrums after 
filtering, (c) Spectrums after downsampling. (d) Spectrums after upsampling. 
(e) Spectrums after interpolation filtering, (f) Reconstructed spectrum. 



Proof 



An expansion is time- invariant if x[n] < > A[fc], then x[n — m] < > X[k — m] for all x[n] in 

hiZ). Thus, we have that 

(<Pk[n], x[n-m}) — (<p k - m [n], x[n\). 

By a change of variable, the left side is equal to (<p^[n + m], x[n]), and then using k' — k — m, 
we find that 



<p k i +m [n + m] - ifi k '[n] 



(3.1.29) 



that is, the expansion operator is Toeplitz. Now, we want the expansion to be orthonormal, 
that is, using (3.1.29), 

(<Pk{n], ip k+m [n\) = (ipk[n], if k [n-m\) = 6[m], 

or the autocorrelation of <p k [n] is a Dirac function. In Fourier domain, this leads to 

l*(OI 2 = i, 

showing that the basis functions have no frequency selectivity since they are allpass func- 
tions. 
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Table 3.1 Basis functions (synthesis niters) in Haar and 
sine cases. 





Haar 


Sine 


5o N 


(5[n] + 5[n-l])/V2 


1 sin(7r/2)n 
v/2 (ir/2)n 


5i N 


(8[n]-5[n-l])/V2 


(-l) n 50[-n+l] 


Go(e*") 


V2e-i^W cos(cj/2) 


f \/2 for w G [-tt/2, tt/2], 
]_ otherwise. 


Gi(e*") 


^2j e -j(^/2) g in( w /2) 


-e- J ' u, Go(-e- J ' u ') 



Therefore, time variance is an inherent feature of orthonormal expansions. Note 
that Proposition 3.3 does not hold if the orthogonality constraint is removed (see 
Problem 3.3). Another consequence of Proposition 3.3 is that there are no banded 3 
orthonormal Toeplitz matrices, since an allpass filter has necessarily infinite impulse 
response. However, in (3.1.18), we saw a banded block Toeplitz matrix (actually, 
block diagonal) that was orthonormal. The construction of orthonormal FIR filter 
banks is the study of such banded block Toeplitz matrices. 

We have seen two extreme cases of structured series expansions of sequences, 
based on Haar and sine filters respectively (Table 3.1 gives basis functions for both 
of these cases). More interesting cases exist between these extremes and they will be 
implemented with filter banks as shown in Figure 3.1(a). Thus, we did not consider 
arbitrary expansions of h(Z), but rather a structured subclass. These expansions 
will have the multiresolution characteristic already built in, which will be shown 
to be a framework for a large body of work on filter banks that appeared in the 
literature of the last decade. 



3.2 Two-Channel Filter Banks 

We saw in the last section how Haar and sine expansions of discrete-time signals 
could be implemented using a two-channel filter bank (see Figure 3.1(a)). The aim 
in this section is to examine two-channel filter banks in more detail. The main idea 
is that perfect reconstruction filter banks implement series expansions of discrete- 
time signals as in the Haar and sine cases. Recall that in both of these cases, the 
expansion is orthonormal and the basis functions are actually the impulse responses 
of the synthesis filters and their even shifts. In addition to the orthonormal case, 
we will consider biorthogonal (or general) expansions (filter banks) as well. 

The present section serves as a core for the remainder of the chapter; all impor- 
tant notions and concepts will be introduced here. For the sake of simplicity, we 
concentrate on the two-channel case. More general solutions are given later in the 



A banded Toeplitz matrix has a finite number of nonzero diagonals. 
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chapter. We start with tools for analyzing general filter banks. Then, we examine 
orthonormal and linear phase two-channel filter banks in more detail. We then 
present results valid for general two-channel filter banks and examine some special 
cases, such as IIR solutions. 

3.2.1 Analysis of Filter Banks 

Consider Figure 3.1(a). We saw in the Haar and sine cases, that such a two-channel 
filter bank implements an orthonormal series expansion of discrete-time signals 
with synthesis filters being the time-reversed version of the analysis filters, that is 
gi[n] = hi[— n]. Here, we relax the assumption of orthonormality and consider a 
general filter bank, with analysis filters /&oM> hi[n] and synthesis filters go[n], g\[n\. 
Our only requirement will be that such a filter bank implements an expansion of 
discrete-time signals (not necessarily orthonormal). Such an expansion will be 
termed biorthogonal. In the filter bank literature, such a system is called a perfect 
reconstruction filter bank. 

Looking at Figure 3.1, besides filtering, the key elements in the filter bank 
computation of an expansion are downsamplers and upsamplers. These perform 
the sampling rate changes and the downsampler creates a periodically time- varying 
linear system. As discussed in Section 2.5.3, special analysis techniques are needed 
for such systems. We will present three ways to look at periodically time-varying 
systems, namely in time, modulation, and polyphase domains. The first approach 
was already used in our discussion of the Haar case. The two other approaches 
are based on the Fourier or z-transform and aim at decomposing the periodically 
time- varying system into several time-invariant subsystems. 



Time-Domain Analysis Recall that in the Haar case (see (3.1.18)), in order to vi- 
sualize block time invariance, we expressed the transform coefficients via an infinite 
matrix, that is 



/ ; ^ 

2/0 [0] 




( ■. \ 

X[0] 


2/1 [0] 




X[l] 


2/o[l] 




X[2] 


2/1 [1] 




X[3] 


V ; ) 




K ; / 



y 



x 



( \ \ 

x[0] 
x[l] 
x[2] 
x[3] 

V ! J 
x 



(3.2.1) 



Here, the transform coefficients X[k] are expressed in another form as well. In 
the filter bank literature, it is more common to write X[k] as outputs of the two 
branches in Figure 3.1(a), that is, as two subband outputs denoted by yo[k] = X[2k], 
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and yi[k] = X[2k + 1]. Also, in (3.2.1), T a ■ x represents the inner products, where 
T a is the analysis matrix and can be expressed as 



/ 



ho[L-l] ho[L-2] ho[L-3] 

hi[L-l] hi[L-2] hi[L-S\ 

h [L-l] 

hi[L-l] 



V 



; ; i \ 

ho[0] 

hi[0] 

ho[2] ho[l] h [0] 

tn[2] tn[l] /n[0] 

; ; ; / 



where we assume that the analysis filters hi[n] are finite impulse response (FIR) 
filters of length L = 2K. To make the block Toeplitz structure of T a more explicit, 
we can write 



\ 

The block Aj is given by 

Ai = 
The transform coefficient 



A 


A 1 ■ 


■ A K _i 





\ 





A ■ 


• A K _ 2 


A K -i 


) 



h [2K-l-2i] h [2K-2-2i] 
hi[2K-l-2i] hi[2K-2-2i] 



X[k] = (ip k [n],x[n]}, 
equals (in the case k = 2k') 

y [k'} = (h [2k' -n],x[n]), 
and (in the case k = 2k' + 1) 

yi[k') = (/n[2A'-n],i[n]). 
The analysis basis functions are thus 



<P2k[n\ 



/io[2A; — n], 
hi[2k -ri\. 



To resynthesize the signal, we use the dual-basis, synthesis, matrix T s 

x = T s y = T s X = T s T a x. 



(3.2.2) 



(3.2.3) 



(3.2.4) 

(3.2.5) 



(3.2.6) 
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Similarly to T a , T s can be expressed as 



( \ \ \ 

go [0] go[l] go [2] 

9l [0] 9l [l] 9l [2] 

g [0] 

5i [0] 

V ; ; ; 



-,T 



\ 



goW - 1] o o 

giW-i] o o 

g [L'-3] g [L'-2] g [L> - 1] 

gi[L'-3] gi[L'-2] 9l [L'-l] 



J 



( 



\ 



s s \ 

Si 



S T k>-i 

°K'-2 °K'-1 



(3.2.7) 



/ 



where the block Si is of size 2x2 and FIR filters are of length L' = 2K ' . The block 
Si is 

50 [2t] 5i [2i] 

.ffo[2* + l] &i[2i + l] 

where 5o[ n ] an d 5i[w] are the synthesis filters. The dual synthesis basis functions 
are 



P2k[n\ 
<P2k+i[n] 



g [n-2k], 
gi[n-2k]. 



Let us go back for a moment to (3.2.6). The requirement that {ho[2k— n], hi[2k— n]} 
and {go[n — 2k], g\[n — 2k]} form a dual bases pair is equivalent to 



T s T a 



T a T s = I. 



(3.2.* 



This is the biorthogonality condition or, in the filter bank literature, the perfect 
reconstruction condition. In other words, 

(ip k [n],(pi[n]) = S[k -I], 

or in terms of filter impulse responses 

(hi[2k-n], gj [n-2l]) = 6[k - I] 5[i - j], i,j = 0,1. 

Consider the two branches in Figure 3.1(a) which produce yo an d yi- Call Hi the 
operator corresponding to filtering by hi[n] followed by downsampling by 2. Then 
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the output y i can be written as (L denotes the filter length) 

/ : : : \/ : ^ 

x[0] 



/ : \ 

Vi[0] 

y<[i] 
V : / 



hi[L-l] hi[L-2] h t [L-3] 
hi[L-l] 



V 



/ 



x[l] 

V : J 



(3.2.9) 



H % 



x 



or, in operator notation 



y, 



Hi x. 



Defining G; t similarly to H j but with gi[n] in reverse order (see also the definition 
of T s ), the output of the system can now be written as 

(G H + Gi Hi) x. 

Thus, to resynthesize the signal (the condition for perfect reconstruction), we have 
that 

Go Hq + Gi H\ = I. 

Of course, by interleaving the rows of Hq and H\, we get T a , and similarly, T s 
corresponds to interleaving the columns of Go and G\. 

To summarize this part on time-domain analysis, let us stress once more that 
biorthogonal expansions of discrete-time signals, where the basis functions are ob- 
tained from two prototype functions and their even shifts (for both dual bases), is 
implemented using a perfect reconstruction, two-channel multirate filter bank. In 
other words, perfect reconstruction is equivalent to the biorthogonality condition 
(3.2.8). 

Completeness is also automatically satisfied. To prove it, we show that there 
exists no x[n] with ||x|| > 0, such that it has a zero expansion, that is, such that 
||X|| = 0. Suppose it is not true, that is, suppose that there exists an x[n] with 
||x|| > 0, such that IIXII = 0. But, since X = T a x, we have that 



and this is possible if and only if 



x 



X 



0, 







(3.2.10) 



(since in a Hilbert space -- h(Z) in this case, ||t>|| 2 = (v,v) = 0, if and only 
if v = 0). We know that (3.2.10) has a nontrivial solution if and only if T a is 
singular. However, due to (3.2.8), T a is nonsingular and thus (3.2.10) has only a 
trivial solution, x = 0, violating our assumption and proving completeness. 
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Modulation-Domain Analysis This approach is based on Fourier or more gener- 
ally z-transforms. Recall from Section 2.5.3, that downsampling a signal with the 
z-transform X{z) by 2 leads to X'{z) given by 

X'(z) = - \x{z l / 2 ) + X{-z l / 2 )\ . (3.2.11) 

Then, upsampling X'(z) by 2 yields X"(z) = X'(z 2 ), or 

X"(z) = \[X{z) + X{-z)\. (3.2.12) 

To verify (3.2.12) directly, notice that downsampling followed by upsampling by 2 
simply nulls out the odd-indexed coefficients, that is, x"[2n] = x[2n] and x"[2n+l] = 
0. Then, note that X{ — z) is the z-transform of (— l) n x[n] by the modulation 
property, and therefore, (3.2.12) follows. 

With this preamble, the z-transform analysis of the filter bank in Figure 3.1(a) 
becomes easy. Consider the lower branch. The filtered signal, which has the z- 
transform H${z) ■ X(z), goes through downsampling and upsampling, yielding (ac- 
cording to (3.2.12)) 

\ [H (z) X(z) + Ho(-z) X{-z)\ . 

This signal is filtered with Gq(z), leading to Xq{z) given by 

X (z) = ^Go(z) [H (z) X(z) + H (-z) X{-z)\ . (3.2.13) 

The upper branch contributes X\(z), which equals to (3.2.13) up to the change of 
index — > 1, and the output of the analysis/synthesis filter bank is the sum of the 
two components Xq{z) and X\{z). This is best written in matrix notation as 

X(z) = X (z)+X 1 (z) (3.2.14) 

'Ho(z) H (-z)\ ( X(z) ' 



-(G (z) G 1 (z)) y {z) Hl{ _ z))Kx{ _ z) 



H m (z) Xm(z) 

In the above, H m {z) is the analysis modulation matrix containing the modulated 
versions of the analysis filters and x m {z) contains the modulated versions of X{z). 
Relation (3.2.14) is illustrated in Figure 3.3, where the time-varying part is in 
the lower channel. If the channel signals Yq{z) and Y\{z) are desired, that is, the 
downsampled domain signals, it follows from (3.2.11) and (3.2.14) that 

Y (z)\ 1 (H {zV 2 ) tf (-^ 1/2 )\ ( X(z^) 

¥&)) ~ 2\H 1 (z^) H^-z 1 / 2 ) ) U(-* V2 ) 
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1 

(-D" 




Figure 3.3 Modulation-domain analysis of the two-channel filter bank. The 
2x2 matrix H m (z) contains the ^-transform of the filters and their modulated 
versions. 



or, calling y(z) the vector [1q( z ) ^l^)]" 



y 



(z) = \H m {z 1 / 2 ) x m (z 1 / 2 



For the system to represent a valid expansion, (3.2.14) has to yield X{z) = X(z), 
which can be obtained when 



G (z)H (z) + G 1 (z)H 1 (z) = 2, 
G (z)H (-z) + G 1 (z)H 1 (-z) = 0. 



(3.2.15) 
(3.2.16) 



The above two conditions then ensure perfect reconstruction. Expressing (3.2.15) 
and (3.2.16) in matrix notation, we get 



(G (z) G 1 (z))-H m (z) = (2 0). 



(3.2.17) 



We can solve now for Gq(z) and G\{z) (transpose (3.2.17) and multiply by (H m (z)) 1 
from the left) 

G (z) \ 2 / H^-z) 

G 1 (z) J - det(H m (z)) V -H (-z) 



(3.2.18) 



In the above, we assumed that H m {z) is nonsingular; that is, its normal rank is 
equal to 2. Define P(z) as 



P(z) = G (z) H (z) 



det(H m {z)) 



HoWHii-z) 



(3.2.19) 



where we used (3.2.18). Observe that det(H m (z)) = — det{H m {— z)) . Then, we 
can express the product G±(z)Hi(z) as 



Gi(z) H^z) 



det(H m (z)) 



H (-z) H^z) = P(-z). 
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It follows that (3.2.15) can be expressed in terms of P(z) as 

P(z) + P(-z) = 2. (3.2.20) 

We will show later, that the function P(z) plays a crucial role in analyzing and 
designing filter banks. It suffices to note at this moment that, due to (3.2.20), all 
even- indexed coefficients of P(z) equal 0, except for p[0] = 1. Thus, P{z) is of the 
following form: 

P( z ) = i + Y,p[2k + l] z~( 2k+1 \ 

k£Z 

A polynomial or a rational function in z satisfying (3.2.20) will be called valid. 
Following the definition of P(z) in (3.2.19), we can rewrite (3.2.15) or equivalently 
(3.2.20) as 

G (z) H (z) + Gq(-z) H (-z) = 2. (3.2.21) 

Using the modulation property, its time-domain equivalent is 

Y, 9o[k] h [n -k} + (-l) n Y, 9o[k] h [n - k] = 2S[n], 

k€Z k&Z 

or equivalently, 

Ydoik] h [2n- k) = S[n], 

k£Z 
since odd-indexed terms are cancelled. Written as an inner product 

(g [k},ho[2n-k}) = S[n], 

this is one of the biorthogonality relations 

(ipo[k], ip 2n [k]) = S[ri\. 

Similarly, starting from (3.2.15) or (3.2.16) and expressing Gq(z) and Hq(z) as 
a function of G\{z) and H\{z) would lead to the other biorthogonality relations, 
namely 

(fti[k], <P2n+i[k}) = S[n], 
(ip [k], ip 2n +i[k]) = 0, 
{<Pi[k],<P2n[k}) = 

Note that we obtained these relations for (p$ and dp\ but they hold also for (f>2i and 
^2«+i) respectively. This shows once again that perfect reconstruction implies the 
biorthogonality conditions. The converse can be shown as well, demonstrating the 
equivalence of the two conditions. 
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(a) 



(b) 
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+ ] . X 




Figure 3.4 Polyphase-domain analysis, (a) Forward and inverse polyphase 
transform, (b) Analysis part in the polyphase domain, (c) Synthesis part in 
the polyphase domain. 



Polyphase-Domain Analysis Although a very natural representation, modulation- 
domain analysis suffers from a drawback — it is redundant. Note how in H m {z) 
every filter coefficient appears twice, since both the filter Hi{z) and its modulated 
version Hi{—z) are present. A more compact way of analyzing a filter bank uses 
polyphase-domain analysis, which was introduced in Section 2.5.3. 

Thus, what we will do is decompose both signals and filters into their polyphase 
components and use (2.5.23) with N = 2 to express the output of filtering followed 
by downsampling. For convenience, we introduce matrix notation to express the 
two channel signals Y$ and Y\, or 



Yo(z) 
Yi(z) 



H 0Q (z) H 01 (z)\ ( X (z) 

H w (z) H n (z))\X 1 (z) 



(3.2.22) 



y(*) 



H p (z) 



X p (z) 
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where Hij is the jth polyphase component of the zth filter, or, following (2.5.22- 
2.5.23), 

Hi(z) = Hioiz^ + zHaiz 2 ). 

In (3.2.22) y(z) contains the signals in the middle of the system in Figure 3.1(a). 
H p {z) contains the polyphase components of the analysis filters, and is conse- 
quently denoted the analysis polyphase matrix, while x p {z) contains the polyphase 
components of the input signal or, following (2.5.20), 

X(z) = X (z 2 ) + z- 1 X 1 (z 2 ). 

It is instructive to give a block diagram of (3.2.22) as shown in Figure 3.4(b). First, 
the input signal X is split into its polyphase components Xq and X\ using a forward 
polyphase transform. Then, a two-input, two-output system containing H p {z) as 
transfer function matrix leads to the outputs yo an< i Hi- 

The synthesis part of the system in Figure 3.1(a) can be analyzed in a similar 
fashion. It can be implemented with an inverse polyphase transform (as given 
on the right side of Figure 3.4(a)) preceded by a two-input two-output synthesis 
polyphase matrix G p {z) defined by 



Goi(z) G n (z) 



where 

Gi{z) = G lQ {z 2 ) + z- l Gn{z 2 ). (3.2.24) 

The synthesis filter polyphase components are defined such as those of the signal 
(2.5.20-2.5.21), or in reverse order of those of the analysis filters. In Figure 3.4(c), 
we show how the output signal is synthesized from the channel signals Yq and Y\ as 

This equation reflects that the channel signals are first upsampled by 2 (leading to 
Yi(z 2 )) and then filtered by filters Gi{z) which can be written as in (3.2.24). Note 
that the matrix-vector product in (3.2.25) is in z 2 and can thus be implemented 
before the upsampler by 2 (replacing z 2 by z) as shown in the figure. 

Note the duality between the analysis and synthesis filter banks. The former 
uses a forward, the latter an inverse polyphase transform, and G p {z) is a transpose 
of H p {z). The phase reversal in the definition of the polyphase components in 
analysis and synthesis comes from the fact that z and z~ l are dual operators, or, 
on the unit circle, e? w = (e~i u )* . 
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Obviously the transfer function between the forward and inverse polyphase 
transforms defines the analysis/synthesis filter bank. This transfer polyphase matrix 
is given by 

T p (z) = G p (z) H p (z). 

In order to find the input-output relationship, we use (3.2.22) as input to (3.2.25), 
which yields 

X(z) = (1 z- 1 ) G p (z 2 ) H p (z 2 ) x p (z 2 ), 

= (1 z~ l ) T p {z 2 ) x p {z 2 ). (3.2.26) 

Obviously, if T p {z) = I, we have 

*(*) = (1 * -1 )(£i(^)) = X(Z) ' 

following (2.5.20), that is, the analysis/synthesis filter bank achieves perfect recon- 
struction with no delay and is equivalent to Figure 3.4(a). 

Relationships Between Time, Modulation and Polyphase Representations 

Being different views of the same system, the representations discussed are related. 
A few useful formulas are given below. From (2.5.20), we can write 



X (z 2 )\ 1 (1 \(1 1\( X(z) 

X 1 (z 2 )) 2\ z \l -l)\x(-z 



■2\ I - o \ -ili i M v, -> I ■ (3.2.27) 



thus relating polyphase and modulation representations of the signal, that is, x p {z) 
and x m {z). For the analysis filter bank, we have that 



H 00 (z 2 ) H 01 (z 2 )\ = 1 (H (z) H (-z)\fl l\fl 
H 10 (z 2 ) H u (z 2 )J 2\H 1 (z) Hii-z))^! -1 J \ z 



establishing the relationship between H p {z) and H m {z). Finally, following the 
definition of G p {z) in (3.2.23) and similarly to (3.2.28) we have 



G Q0 (z 2 ) G 10 (z 2 )\ = 1/1 \(1 l\(G (z) G 1 (z) 
G i(z 2 ) G n (z 2 )) 2\ J 1 -1 J \G (-z) G^-z) 



which relates G p {z) with G m {z) defined as 

' G (z) G x {z 



Gm{Zj \G (-z) G 1 (-z) 

Again, note that (3.2.28) is the transpose of (3.2.29), with a phase change in the 
diagonal matrix. The change from the polyphase to the modulation representation 
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(and vice versa) involves not only a diagonal matrix with a delay (or phase factor), 
but also a sum and/or a difference operation (see the middle matrix in (3.2.27- 
3.2.29)). This is actually a size-2 Fourier transform, as will become clear in cases 
of higher dimension. 

The relation between time domain and polyphase domain is most obvious for 
the synthesis filters <?j, since their impulse responses correspond to the first basis 
functions (fi. Consider the time-domain synthesis matrix, and create a matrix T s {z) 

K'-l 



T s (z) = J2 S i 



»=o 

where Si are the successive 2x2 blocks along a column of the block Toeplitz matrix 
(there are K' of them for length 2K' filters) , or 

90 [2t] gi[2i] 

g [2i + l] gi[2i + l] 

Then, by inspection, it can be seen that T s (z) is identical to G p (z). A similar 
relation holds between H p {z) and the time-domain analysis matrix. It is a bit 
more involved since time reversal has to be taken into account, and is given by 



-K+Itt / -1\ 



T a (z) = z-" +1 H p (z M ._j 

where 

K-l 

T a {z) = VA, 



1 




=0 



and 



ho[2(K-i)-l] h [2(K-i)-2] 
hi[2(K-i)-l] hi[2(K-i)-2] 

K being the number of 2 x 2 blocks in a row of the block Toeplitz matrix. The 
above relations can be used to establish equivalences between results in the various 
representations (see also Theorem 3.7 below). 

3.2.2 Results on Filter Banks 

We now use the tools just established to review several classic results from the filter 
bank literature. These have a slightly different flavor than the expansion results 
which are concerned with the existence of orthogonal or biorthogonal bases. Here, 
approximate reconstruction is considered, and issues of realizability of the filters 
involved are very important. 
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In the filter bank language, perfect reconstruction means that the output is a 
delayed and possibly scaled version of the input, 

X(z) = cz- k X(z). 

This is equivalent to saying that, up to a shift and scale, the impulse responses of the 
analysis filters (with time reversal) and of the synthesis filters form a biorthogonal 
basis. 

Among approximate reconstructions, the most important one is alias-free re- 
construction. Remember that because of the periodic time-variance of analy- 
sis/synthesis filter banks, the output is both a function of x[n] and its modulated 
version (— l) n x[n], or X{z) and X{ — z) in the z-transform domain. The aliased 
component X{—z) can be very disturbing in applications and thus cancellation of 
aliasing is of prime importance. In particular, aliasing represents a nonharmonic 
distortion (new sinusoidal components appear which are not harmonically related 
to the input) and this is particularly disturbing in audio applications. 

What follows now, are results on alias cancellation and perfect reconstruction 
for the two-channel case. Note that all the results are valid for a general, TV-channel 
case as well (substitute N for 2 in statements and proofs). 

For the first result, we need to introduce pseudocirculant matrices [311]. These 
are N x N circulant matrices with elements Fij(z), except that the lower triangular 
elements are multiplied by z, that is 



Fij(z) 



Fo,j-i(z) j > i, 
z • Fo !N+ j-i(z) j < i. 



Then, the following holds: 

Proposition 3.4 

Aliasing in a one-dimensional subband coding system will be cancelled if and 
only if the transfer polyphase matrix T p is pseudocirculant [311]. 

Proof 

Consider a 2 x 2 pseudocirculant matrix 

T (z) - ( Fo{z) Fl(z) \ 
lp[Z) ~ \zF 1 (z) F (z))' 

and substitute it into (3.2.26) 

X(z) = (1 ,- 1 )T p (, 2 )(^gj), 
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yielding (use F(z) = F (z 2 ) + zFi(z 2 )) 

X(z) = (F(z) * _1 n*))-(^j). 

= F(z)-(X (z 2 ) + z- 1 X 1 (z 2 )), 
= F{z)-X(z), 

that is, it results in a time-invariant system or aliasing is cancelled. Given a time-invariant 
system, defined by a transfer function F(z), it can be shown (see [311]) that its polyphase 
implementation is pseudocirculant. 

A corollary to Proposition 3.4, is that for perfect reconstruction, the transfer func- 
tion matrix has to be a pseudocirculant delay, that is, for an even delay 2k 

while for an odd delay 2k + 1 

-fc-i / 1 



T p (z) = z 



z 



The next result indicates when aliasing can be cancelled for a given analysis filter 
bank. Since the analysis and synthesis filter banks play dual roles, the result that 
we will discuss holds for synthesis filter banks as well. 

Proposition 3.5 

Given a two-channel filter bank downsampled by 2 with the polyphase matrix 
Hp(z), then alias- free reconstruction is possible if and only if the determinant 
of H p (z) is not identically zero, that is, H p {z) has normal rank 2. 

Proof 

Choose the synthesis matrix as 

Gp(z) — cofactor (H p (z)) , 

resulting in 

T p {z) = G p {z)H p (z) = det(H p {z))-I 

which is pseudocirculant, and thus cancels aliasing. If, on the other hand, the system is 
alias- free, then we know (see Proposition 3.4) that T p (z) is pseudocirculant and therefore 
has full rank 2. Since the rank of a matrix product is bounded above by the ranks of its 
terms, H p (z) has rank 2. 4 

Often, one is interested in perfect reconstruction filter banks where all filters 
involved have a finite impulse response (FIR). Again, analysis and synthesis filter 
banks play the same role. 



4 Note that we excluded the case of zero reconstruction, even if technically it is also aliasing free 
(but of zero interest!). 
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Proposition 3.6 

Given a critically sampled FIR analysis filter bank, perfect reconstruction 
with FIR niters is possible if and only if det(H p (z)) is a pure delay. 

Proof 

Suppose that the determinant of H p (z) is a pure delay, and choose 

Gp(z) — cofactor (H p (z)) . 

It is obvious that the above choice leads to perfect reconstruction with FIR filters. Suppose, 
on the other hand, that we have perfect reconstruction with FIR filters. Then, T p (z) has 
to be a pseudocirculant shift (corollary below Proposition 3.4), or 

det(T p (z)) = det(G p (z))-det(H p (z)) = z~ l , 

meaning that it has I poles at z — 0. Since the synthesis has to be FIR as well, det(G p («)) 
has only zeros (or poles at the origin). Therefore, det(H p (z)) cannot have any zeros (except 
possibly at the origin or oo). 

If det(H p (z)) has no zeros, neither does det(H m (z)) (because of (3.2.28) and 
assuming FIR filters). Since det(H m (z)) is an odd function of z, it is of the form 

det(H m (z)) = az~ 2k - 1 , 

(typically, a = 2) and following (3.2.18) 

G (z) = -z 2k+1 HA-z), (3.2.30) 

a 

GAz) = --z 2k+1 H (-z). (3.2.31) 

a 

These filters give perfect reconstruction with zero delay but they are noncausal if 
the analysis filters are causal. Multiplying them by z~ 2k ~ l gives a causal version 
with perfect reconstruction and a delay of 2k + I samples (note that the shift can 
be arbitrary, since it only changes the overall delay). 

In the above results, we used the polyphase decomposition of filter banks. All 
these results can be translated to the other representation as well. In particular, 
aliasing cancellation can be studied in the modulation domain. Then, a necessary 
and sufficient condition for alias cancellation is that (see (3.2.14)) 

(G (z) Gi(«))-ff m (z) 

be a row-vector with only the first component different from zero. One could expand 
( Gq(z) G\{z) ) into a matrix G m {z) by modulation, that is 
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It is easy to see then that for the system to be alias-free 



T m (z) — G m (z) H m (zj — . F(—z) 

The matrix T m {z) is sometimes called the aliasing cancellation matrix [272]. 

Let us for a moment return to (3.2.14). As we said, X{—z) is the aliased version 
of the signal. A necessary and sufficient condition for aliasing cancellation is that 

G (z) H Q (-z) + G 1 (z) Hii-z) = 0. (3.2.33) 

The solution proposed by Croisier, Esteban, Galand [69] is known under the name 
QMF (quadrature mirror filters), which cancels aliasing in a two-channel filter bank: 

H^z) = H (-z), (3.2.34) 

G (z) = H (z), 

G x {z) = -H^z) = -Ho(-z). (3.2.35) 

Substituting the above into (3.2.33) leads to Hq{z)Hq{—z) — Hq(— z)Hq(z) = 0, and 
aliasing is indeed cancelled. In order to achieve perfect reconstruction, the following 
has to be satisfied: 

G (z) H (z) + G 1 (z) Hi(z) = 2z~ l . (3.2.36) 

For the QMF solution, (3.2.36) becomes 

Hl{z) - Hl(-z) = 2z~ l . (3.2.37) 

Note that the left side is an odd function of z, and thus, I has to be odd. The above 
relation explains the name QMF. On the unit circle Hq( — z) = H{e^ U]+ ' K ') is the 
mirror image of Hq{z) and both the filter and its mirror image are squared. For FIR 
filters, the condition (3.2.37) cannot be satisfied exactly except for the Haar filters 
introduced in Section 3.1. Taking a causal Haar filter, or Hq(z) = (1 + z _1 )/\/2, 
(3.2.37) becomes 

-(1 + 2z- 1 + z~ 2 ) - -(1 - 2z~ l + z~ 2 ) = 2z~ l . 

2 V ' 2 V ; 

For larger, linear phase filters, (3.2.37) can only be approximated (see Section 3.2.4). 

Summary Of Biorthogonality Relations Let us summarize our findings on bior- 
thogonal filter banks. 
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Theorem 3.7 

In a two-channel, biorthogonal, real-coefficient filter bank, the following are 
equivalent: 

(a) (hi[-n],gj[n - 2m]) = 5[i - j]S[m], i = 0, 1. 

(b) G (z)H (z) + G 1 (z)H 1 (z) = 2, and G (z)H (-z) + G 1 (z)H l (-z) = 0. 

(c) T s -T a = T a -T s = I. 

(d) G m {z)H m {z) = H m (z)G m (z) = 21. 

(e) G p {z)H p {z) = H p {z)G p {z) = I. 

The proof follows from the equivalences between the various representations intro- 
duced in this section and is left as an exercise (see Problem 3.4). Note that we are 
assuming a critically sampled filter bank. Thus, the matrices in points (c)-(e) are 
square, and left inverses are also right inverses. 

3.2.3 Analysis and Design of Orthogonal FIR Filter Banks 

Assume now that we impose two constraints on our filter bank: First, it should 
implement an orthonormal expansion 5 of discrete-time signals and second, the filters 
used should be FIR. 

Let us first concentrate on the orthonormality requirement. We saw in the Haar 
and sine cases (both orthonormal expansions), that the expansion was of the form 

x[n] = J2^k[l],x[l}} <p k [n] = J2 X l k }Vk[n], (3.2.38) 

k&Z k&Z 

with the basis functions being 

ip 2k [n\ = h [2k-n] = go[n-2k], (3.2.39) 

V2k+M = h![2k-n] = gi[n-2k], (3.2.40) 

or, the even shifts of synthesis filters (even shifts of time-reversed analysis filters). 
We will show here that (3.2.38-3.2.40) describe orthonormal expansions, in the 
general case. 



5 The term orthogonal is often used, especially for the associated niters or filter banks. For filter 
banks, the term unitary or paraunitary is also often used, as well as the notion of losslessness (see 
Appendix 3. A). 
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Orthonormality in Time Domain Start with a general filter bank as given in Fig- 
ure 3.1(a). Impose orthonormality on the expansion, that is, the dual basis {(/^[n]} 
becomes identical to {y?fc[n]}. In filter bank terms, the dual basis — synthesis filters 
— now becomes 

{g [n-2k}, gi [n-2k}} = {<p k [n}} = {<p k [n}} = {h [2k-n],hi[2k-n]}, (3.2.41) 

or, 

g t [n) = hi[-n], i = 0,l. (3.2.42) 

Thus, we have encountered the first important consequence of orthonormality: The 
synthesis filters are the time-reversed versions of the analysis filters. Also, since 
(3.2.41) holds and (p k is an orthonormal set, the following are the orthogonality 
relations for the synthesis filters: 

( gi [n - 2k], gj [n -21]) = S[i - j] 5[k - I], (3.2.43) 

with a similar relation for the analysis filters. We call this an orthonormal filter 
bank. 

Let us now see how orthonormality can be expressed using matrix notation. 
First, substituting the expression for gi[n] given by (3.2.42) into the synthesis matrix 
T s given in (3.2.7), we see that 

T — T T 

or, the perfect reconstruction condition is 

T s T a = T T a T a = I. (3.2.44) 

That is, the above condition means that the matrix T a is unitary. Because it is 
full rank, the product commutes and we have also T a T a = I. Thus, having an 
orthonormal basis, or perfect reconstruction with an orthonormal filter bank, is 
equivalent to the analysis matrix T a being unitary. 

If we separate the outputs now as was done in (3.2.9), and note that 

Gi = H t , 

then the following is obtained from (3.2.43): 

Hi Hj = 5[i-j] I, i, j = 0,l- 

Now, the output of one channel in Figure 3.1(a) (filtering, downsampling, upsam- 
pling and filtering) is equal to 

M« = Hj Hi. 
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It is easy to verify that Mi satisfies the requirements for an orthogonal projection 
(see Appendix 2. A) since Mi = Mi and Mi = Mi. Thus, the two channels of 
the filter bank correspond to orthogonal projections onto spaces spanned by their 
respective impulse responses, and perfect reconstruction can be written as the direct 
sum of the projections 

^o ^o + H 1 Hi = I. 

Note also, that sometimes in order to visualize the action of the matrix T a , it is 
expressed in terms of 2 x 2 blocks Aj (see (3.2.2-3.2.3)), which can also be used to 
express orthonormality as follows (see (3.2.44)): 

K-l 



J2AfA t = I, 

z=0 

-l 
J2Af +j Ai = 0, j = l,...,K-l. 



;=o 

K-\ 



i=0 

Orthonormality in Modulation Domain To see how orthonormality translates in 
the modulation domain, consider (3.2.43) and i = j — 0. Substitute n' = n — 2k. 
Thus, we have 

(g [n'},g [n' + 2(k-l)}) = 6[k - l], 

or 

(go[n],g [n + 2m]) = 5[m\. (3.2.45) 

Recall that p[l] = {go [n] , go [n + I]) is the autocorrelation of the sequence go[n] (see 
Section 2.5.2). Then, (3.2.45) is simply the autocorrelation of go[n] evaluated at 
even indexes I = 2m, or p[l] downsampled by 2, that is, p'[m] = p[2m\. The 
z-transform of p'[m] is (see Section 2.5.3) 

P'(z) = \[P{z^) + P(-, 1 / 2 )]. 

Replacing z by z 2 (for notational convenience) and recalling that the z-transform 
of the autocorrelation of go[n] is given by P (z) = G$(z) ■ Go(z~ 1 ), the z-transform 
of (3.2.45) becomes 

Go(z)Go(z- 1 ) + Go(-z)G (-z- 1 ) = 2. (3.2.46) 

Using the same arguments for the other cases in (3.2.43), we also have that 

G 1 {z)G 1 (z- 1 ) + G 1 {-z)G 1 (-z- 1 ) = 2, (3.2.47) 

Go(z)G 1 (z- 1 ) + G (-z)G 1 (-z- 1 ) = 0. (3.2.48) 
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On the unit circle, (3.2.46-3.2.47) become (use G(e~ ju} ) = G*(e jw ) since the filter 
has real coefficients) 

|G,(e^)| 2 + \Gi(e j ^ +7r) )\ 2 = 2, (3.2.49) 

that is, the filter and its modulated version are power complementary (their mag- 
nitudes squared sum up to a constant). Since this condition was used in [270] 
for designing the first orthogonal filter banks, it is also called the Smith-Barnwell 
condition. Writing (3.2.46-3.2.48) in matrix form, 



G Q {z~ l ) G (-z- l )\ ( G (z) G x {z) \ (2 

G 1 (z~ 1 ) di-z-^jyCoi-z) G 1 (-z)J \0 2 



(3.2.50) 



that is, using the synthesis modulation matrix G m (z) (see (3.2.32)) 

GKz- 1 ) G m {z) = 21. (3.2.51) 

Since g^ and hi are identical up to time reversal, a similar relation holds for the 
analysis modulation matrix H m (z) (up to a transpose), or H m (z~ l ) H m (z) = 21. 
A matrix satisfying (3.2.51) is called paraunitary (note that we have assumed 
that the filter coefficients are real) . If all its entries are stable (which they are in this 
case, since we assumed the filters to be FIR), then such a matrix is called lossless. 
The concept of losslessness comes from classical circuit theory [23, 308] and is 
discussed in more detail in Appendix 3. A. It suffices to say at this point that having 
a lossless transfer matrix is equivalent to the filter bank implementing an orthogonal 
transform. Concentrating on lossless modulation matrices, we can continue our 
analysis of orthogonal systems in the modulation domain. First, from (3.2.50) we 
can see that ( Gi(z~ 1 ) G\(— z~ x ) ) has to be orthogonal to ( Gq(z) Gq(—z) ) . 
It will be proven in Appendix 3. A (although in polyphase domain), that this implies 
that the two filters Gq(z) and G\{z) are related as follows: 

G x {z) = -z~ 2K+1 Goi-z- 1 ), (3.2.52) 

or, in time domain 

9i[n] = (-l) n g [2K-l-n}. 

Equation (3.2.52) therefore establishes an important property of an orthogonal 
system: In an orthogonal two-channel filter bank, all filters are obtained from a 
single prototype filter. 

This single prototype filter has to satisfy the power complementary property 
given by (3.2.49). For filter design purposes, one can use (3.2.46) and design an 
autocorrelation function P{z) that satisfies P(z) + P{—z) = 2 as will be shown 
below. This special form of the autocorrelation function can be used to prove that 
the filters in an orthogonal FIR filter bank have to be of even length (Problem 3.5). 
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Orthonormality in Polyphase Domain We have seen that the polyphase and 
modulation matrices are related as in (3.2.29). Since G m and G p are related by 
unitary operations, G p will be lossless if and only if G m is lossless. Thus, one 
can search or examine an orthonormal system in either modulation, or polyphase 
domain, since 

G p (z ) G p (z ) = - G m (z 



i " mV >\i -l Ao z~ l 

1 0\ 1 1 

o z A i -i 



G m (z) 



= l -Gl(z- 1 ) G m (z) = I, (3.2.53) 

where we used (3.2.51). Since (3.2.53) also implies G p (z) Gj } {z~ l ) = I (left inverse 
is also right inverse), it is clear that given a paraunitary G p (z) corresponding to 
an orthogonal synthesis filter bank, we can choose the analysis filter bank with a 
polyphase matrix H p (z) = G„ {z~ l ) and get perfect reconstruction with no delay. 

Summary of Orthonormality Relations Let us summarize our findings so far. 

Theorem 3.8 

In a two-channel, orthonormal, FIR, real-coefficient filter bank, the following 
are equivalent: 

(a) (g i [n},g j [n + 2m\) = 6[i - j] 6[m], i = 0, 1. 

(b) Go(z)G (z- 1 ) + G (-z)G (-z- 1 ) = 2, 
and Gi(z) = -z~ 2K+l G Q (-z~ l ), K^Z. 

(c) T T S T s = T S T T S = I, T a = Tj. 

(d) GKz- 1 ) G m {z) = G m {z) G T m {z~ l ) = 21, H m {z) = G T m {z~ l ). 

(e) G T p {z-^) G p (z) = G p (z) G T p {z^) = I, H p (z) = G T p {z^). 

Again, we used the fact that the left inverse is also the right inverse in a square 
matrix in relations (c), (d) and (e). The proof follows from the relations between 
the various representations, and is left as an exercise (see Problem 3.7). Note that 
the theorem holds in more general cases as well. In particular, the filters do not have 
to be restricted to be FIR, and if their coefficients are complex valued, transposes 
have to be hermitian transposes (in the case of G m and G p , only the coefficients of 
the filters have to be conjugated, not z since z _1 plays that role). 
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Because all niters are related to a single prototype satisfying (a) or (b), the 
other filter in the synthesis filter bank follows by modulation, time reversal and an 
odd shift (see (3.2.52)). The filters in the analysis are simply time-reversed versions 
of the synthesis filters. In the FIR case, the length of the filters is even. Let us 
formalize these statements: 

Corollary 3.9 

In a two-channel, orthonormal, FIR, real-coefficient filter bank, the following 
hold: 

(a) The filter length L is even, or L = 2K. 

(b) The filters satisfy the power complementary or Smith-Barnwell condi- 
tion. 

|G (e^)| 2 +|G (e i(cJ+7r) )| 2 = 2, |G (e^)| 2 +|Gi(e^)| 2 = 2. (3.2.54) 



(c) The highpass filter is specified (up to an even shift and a sign change) 
by the lowpass filter as 

Gl ( z ) = -z~ 2K+1 Go(z~ l ). 

(d) If the lowpass filter has a zero at n, that is, Gq(— 1) = 0, then 

G (l) = V2. (3.2.55) 

Also, an orthogonal filter bank has, as any orthogonal transform, an energy conser- 
vation property: 

Proposition 3.10 

In an orthonormal filter bank, that is, a filter bank with a unitary polyphase 
or modulation matrix, the energy is conserved between the input and the 
channel signals, 

1 1 1 1 z M ii 2 i ii 1 1 '2 /oop'/^N 

IFF = 1 1 2/0 1 1 + \\yi\\ ■ (3.2.56) 

Proof 

The energy of the subband signals equals 



lyof + Hyif = ^ j^ {\Y (e^)\ 2 + \Y l (e^)\ 2 )dLO, 
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by Parseval's relation (2.4.37). Using the fact that y(z) — H p (z) x p (z), the right side can 
be written as, 



r-27T 

2- ' 

x H p {e ] ^) Xp(e 3u} ) dcj, 



; J ") dui, 

) x p {e 3 ^) du;, 



II II 2 ii ii 2 

= | [ a?o 1 1 + \\xi\\ ■ 

We used the fact that H p (e 3U ') is unitary and Parseval's relation. Finally, (3.2.56) follows 
from the fact that the energy of the signal is equal to the sum of the polyphase components' 
energy, ||x|| 2 = ||xo|| 2 + ||xi || 2 . 

Designing Orthogonal Filter Banks Now, we give two design procedures: the 
first, based on spectral factorization, and the second, based on lattice structures. 
Let us just note that most of the methods in the literature design analysis filters. 
We will give designs for synthesis filters so as to be consistent with our approach; 
however, analysis filters are easily obtained by time reversing the synthesis ones. 

Designs Based on Spectral Factorizations The first solution we will show is due to 
Smith and Barnwell [271]. The approach here is to find an autocorrelation se- 
quence P(z) = Go(z)Gq(z~ 1 ) that satisfies (3.2.46) and then to perform spectral 
factorization as explained in Section 2.5.2. However, factorization becomes numeri- 
cally ill-conditioned as the filter size grows, and thus, the resulting filters are usually 
only approximately orthogonal. 

Example 3.1 

Choose p[n] as a windowed version of a perfect half-band lowpass filter, 

w[n] sin( " /2 " ) n = -2K + l,...,2K-l, 

p[n\ — < L ' w / 2 - n 

otherwise. 

where w[n] is a symmetric window function with w[0] — 1. Because p[2n] — S[n], the 
z-transform of p[n] satisfies 

P(z) + P{-z) = 2. (3.2.57) 

Also since P(z) is an approximation to a half-band lowpass filter, its spectral factor will be 
such an approximation as well. Now, P(e J ") might not be positive everywhere, in which 
case it is not an autocorrelation and has to be modified. The following trick can be used 
to find an autocorrelation sequence p'[n] close to p[n] [271]. Find the minimum of P(e 3 ^), 
Smin = rnin a ,[P(e ;,a ')]. If &m%n > 0, we need not do anything, otherwise, subtract it from 
p[0] to get the sequence p'[n] . Now, 

P'{e^) = P(e^)-5 mm >0, 

and P'{z) still satisfies (3.2.57) up to a scale factor (1 — 5 m in) which can be divided out. 
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(c) 



(d) 



Figure 3.5 Orthogonal filter designs. Magnitude responses of: (a) Smith and 
Barnwell filter of length 8 [271], (b) Daubechies' filter of length 8 (£> 4 ) [71], 
(c) Vaidyanathan and Hoang filter of length 8 [310], (d) Butterworth filter for 

TV = 4 [133]. 



An example of a design for N = 8 by Smith and Barnwell is given in Figure 
3.5(a) (magnitude responses) and Table 3.2 (impulse response coefficients) [271]. 

Another example based on spectral factorization is Daubechies' family of max- 
imally flat filters [71]. Daubechies' purpose was that the filters should lead to 
continuous-time wavelet bases (see Section 4.4). The design procedure then amounts 
to finding orthogonal lowpass filters with a large number of zeros at u = tv. Equiv- 
alently, one has to design an autocorrelation satisfying (3.2.46) and having many 
zeros at uj = tt. That is, we want 

P(z) = (l + z- 1 ) k (l + z) k R(z), 

which satisfies (3.2.57), where R(z) is symmetric (R(z~ 1 ) = R(z)) and positive 
on the unit circle, R{e ]ul ) > 0. Of particular interest is the case when R{z) is 



136 



CHAPTER 3 



Table 3.2 Impulse response coefficients for 
Smith and Barnwell filter [271], Daubechies' fil- 
ter D4 [71] and Vaidyanathan and Hoang filter 
[310] (all of length 8). 



n 


Smith and 


Daubechies 


Vaidyanathan 




Barnwell 




and Hoang 





0.04935260 


0.2303778f 


0.27844300 


1 


-0.0f553230 


0.7f 484657 


0.73454200 


2 


-0.08890390 


0.63088076 


0.58191000 


3 


0.3f665300 


-0.02798376 


-0.05046140 


4 


0.7875f500 


-0.f870348f 


-0.19487100 


5 


0.50625500 


0.03084f38 


0.03547370 


6 


-0.033800f0 


0.0328830f 


0.04692520 


7 


-0.f0739700 


-0.0f059740 


-0.01778800 



of minimal degree, which turns out to be when R{z) has powers of z going from 
(— k+1) to {k—1). Once the solution to this constrained problem is found, a spectral 
factorization of R{z) yields the desired filter Gq{z), which has automatically k zeros 
at 7T. As always with spectral factorization, there is a choice of taking zeros either 
inside or outside the unit circle. Taking them systematically from inside the unit 
circle, leads to Daubechies' family of minimum-phase filters. 

The function R{z) which is required so that P{z) satisfies (3.2.57) can be found 
by solving a system of linear equations or a closed form is possible in the minimum- 
degree case [71]. Let us indicate a straightforward approach leading to a system of 
linear equations. Assume the minimum-degree solution. Then P{z) has powers of 
z going from {—2k + 1) to {2k — 1) and (3.2.57) puts 2k — 1 constraints on P{z). 
But because P{z) is symmetric, k — 1 of them are redundant, leaving k active 
constraints. Because R{z) is symmetric, it has k degrees of freedom (out of its 
2k — 1 nonzero coefficients). Since P{z) is the convolution of (1 + z~ 1 ) k {l + z) k with 
R{z), it can be written as a matrix- vector product, where the matrix contains the 
impulse response of (1 + z _1 ) fc (l + z) k and its shifts. Gathering the even terms of 
this matrix-vector product (which correspond to the k constraints) and expressing 
them in terms of the k free parameters of R{z), leads to the desired k x k system 
of equation. It is interesting to note that the matrix involved is never singular, and 
the R{z) obtained by solving the system of equations is positive on the unit circle. 
Therefore, this method automatically leads to an autocorrelation, and by spectral 
factorization, to an orthogonal hlter bank with filters of length 2k having k zeros 
at n and for the lowpass and highpass, respectively 

As an example, we will construct Daubechies' D<i filter, that is, a length-4 
orthogonal filter with two zeros at u = n (the maximum number of zeros at tt is 
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equal to half the length, and indicated by the subscript). 
Example 3.2 

Let us choose k — 2 and construct length-4 filters. This means that 

P(z) = G (z)G (z- 1 ) = (l + z^fd + zfRiz). 

Now, recall that since P(z) + P(—z) — 2, all even-indexed coefficients in P(z) equal 0, 
except for p[0] = 1. To obtain a length-4 filter, the highest-degree term has to be z~ 3 , and 
thus R(z) is of the form 

R(z) = (az + b + az' 1 ). (3.2.58) 

Substituting (3.2.58) into P(z) we obtain 

P(z) = az 3 + (4a + b)z 2 + (7a + 4b)z + (8a + 66) + (46 + 7a)z _1 + (6 + Aa)z~ 2 + az~ 3 . 
Equating the coefficients of z 2 or z~ 2 with 0, and the one with z° with 1 yields 

4a + 6 = 0, 8a + 66 = 1. 
The solution to this system of equations is 

1 . 1 



yielding the following R(z) 



a = "16' 



R(z) = -—z+-- —z~\ 
y ' 16 4 16 



We factor now R(z) as 

R{z) = (-^=) (! + v^+(l- V'3>" 1 )(l + V / 3 + (l- V3)z). 

Taking the term with the zero inside the unit circle, that is (1 + y/3 + (1 — VS)z~ 1 ), we 
obtain the filter Go(z) as 

G a (z) = -^-(l + z-'fil + VS + il-^z- 1 ), 

+ (3 + V3)^ _1 + (3 - V3)z~ 2 + (1 - Vs)z~ 3 ). (3.2.59) 

Note that this lowpass filter has a double zero at z = — 1 (important for constructing wavelet 
bases, as will be seen in Section 4.4). A longer filter with four zeros at cu — n is shown in 
Figure 3.5(b) (magnitude responses of the lowpass/highpass pair) while the impulse response 
coefficients are given in Table 3.2 [71]. 
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Figure 3.6 Two-channel lattice factorization of paraunitary filter banks. The 
2x2 blocks U i are rotation matrices. 



Designs Based on Vaidyanathan and Hoang Lattice Factorizations An alternative and 
numerically well-conditioned procedure relies on the fact that paraunitary, just 
like unitary matrices, possess canonical factorizations 6 into elementary paraunitary 
matrices [305, 310] (see also Appendix 3. A). Thus, all paraunitary filter banks with 
FIR filters of length L = 2K can be reached by the following lattice structure (here 
G 1 (z) = -z- 2K+1 G (-z- 1 )): 



GJz) 



G 00 (z) G 10 (z) 
Goi(z) Gn(z) 



Ur. 



K-\ 

n 



i 



Ui 



(3.2.60) 



where Ui is a 2 x 2 rotation matrix given in (2.B.1) 



Ui 



COS CHi — sin OL\ 
sin «j cos «j 



That the resulting structure is paraunitary is easy to check (it is the product of 
paraunitary elementary blocks). What is much more interesting is that all pa- 
raunitary matrices of a given degree can be written in this form [310] (see also 
Appendix 3.A.1). The lattice factorization is given in Figure 3.6. 

As an example of this approach, we construct the Z?2 filter from the previous 
example, using the lattice factorization. 



Example 3.3 

We construct the D2 filter which is of length 4, thus L — 2K — 4. This means that 



G p (z) = 



cos qo — Sin «o 
sin ctQ cos ao 

cos ao cos ai — sinao sin ct\z~ 
sinao cos ot\ + cos ao sin a.\z~ 



cos OL\ 
sinai 



- smai 
cosai 



- cos ao sinai — sinao cos a\z 

- sin ao sin ai + cosao cos ot\z~ 



(3.2.61) 



6 By canonical we mean complete factorizations with a minimum number of free parameters. 
However, such factorizations are not unique in general. 
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We get the lowpass filter Go(z) as 

Go(z) = Goo(z ) + z~ Gqi(z ), 

— 1 • —2 ■ —3 

= cos ao cos ai + sm ao cos onz -sniQjsmaiz + cos ao sm aiz 

We now obtain the D2 filter by imposing a second-order zero at z — —1. So, we obtain the 
first equation as 

Go(— 1) = cos ai cos ao — cosai sinao — sinai sinao — sinai cosao = 0, 

or, 

cos(ao + Qi) — sin(ao + ai) = 0. 

This equation implies that 

1 n 

a + ai = k-K + —. 
4 

Since we also know that Go(l) = \/2 (see (3.2.55) 

cos(a + Qi) + sin(a + on) = V2, 

we get that 

ao+ai = |. (3.2.62) 

Imposing now a zero at e 3u} = -1 on the derivative of Go(e-'' J ), we obtain 

rfGo(e J ") 



du> 
Solving (3.2.62) and (3.2.63), we obtain 



cos ai sinao + 2 sinai sinao + 3 sinai cos ao = 0. (3.2.63) 



a = 77, Oil — 



12 

Substituting the angles ao, on into the expression for Go (z) (3.2.61) and comparing it to 
(3.2.59), we can see that we have indeed obtained the D2 filter. 

An example of a longer filter obtained by lattice factorization is given in Fig- 
ure 3.5(c) (magnitude responses) and Table 3.2 (impulse response coefficients). This 
design example was obtained by Vaidyanathan and Hoang in [310]. 

3.2.4 Linear Phase FIR Filter Banks 

Orthogonal filter banks have many nice features (conservation of energy, identical 
analysis and synthesis) but also some restrictions. In particular, there are no or- 
thogonal linear phase solutions with real FIR filters (see Proposition 3.12) except 
in some trivial cases (such as the Haar filters). Since linear phase filter banks yield 
biorthogonal expansions, four filters are involved, namely Hq, H\ at analysis, and 
Gq an d G\ at synthesis. In our discussions, we will often concentrate on Hq and 
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Hi first (that is, in this case we design the analysis part of the system, or, one of 
the two biorthogonal bases). 

First, note that if a filter is linear phase, then it can be written as 

H(z) = ±z~ L+1 Hiz' 1 ), (3.2.64) 

where ± will mean it is a symmetric/antisymmetric filter, respectively, and L de- 
notes the filter's length. Note that here we have assumed that H{z) has the impulse 
response ranging from h[0], . . . , h[L — 1] (otherwise, modify (3.2.64) with a phase 
factor). Recall from Proposition 3.6 that perfect reconstruction FIR solutions are 
possible if and only if the matrix H p (z) (or equivalently H m (z)) has a determinant 
equal to a delay, that is [319] 

Hoo(z) H u (z) - H 01 (z) H 10 (z) = z~\ (3.2.65) 

H (z) H 1 {-z)-H {-z) H^z) = 2z~ 21 - 1 . (3.2.66) 

The right-hand side of (3.2.65) is the determinant of the polyphase matrix H p (z), 
while the right-hand side of (3.2.66) is the determinant of the modulation matrix 
H m {z). The synthesis filters are then equal to (see (3.2.30-3.2.31)) 

G (z) = z^H^-z), G x {z) = - z - k H (-z), 

where k is an arbitrary shift. 

Of particular interest is the case when both Hq(z) and H\(z) are linear phase 
(symmetric or antisymmetric) filters. Then, as in the paraunitary case, there are 
certain restrictions on possible filters [315, 319]. 

Proposition 3.11 

In a two-channel, perfect reconstruction filter bank, where all filters are linear 
phase, the analysis filters have one of the following forms: 

(a) Both filters are symmetric and of odd lengths, differing by an odd mul- 
tiple of 2. 

(b) One filter is symmetric and the other is antisymmetric; both lengths are 
even, and are equal or differ by an even multiple of 2. 

(c) One filter is of odd length, the other one of even length; both have all 
zeros on the unit circle. Either both filters are symmetric, or one is 
symmetric and the other one is antisymmetric (this is a degenerate case) 
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The proof can be found in [319] and is left as an exercise (see Problem 3.8). 
We will discuss it briefly. The idea is to consider the product polynomial P (z) = 
Hq{z)H\{ — z) that has to satisfy (3.2.66). Because Hq(z) and H\{z) (as well as 
Hi{ — z)) are linear phase, so is P{z). Because of (3.2.66), when P(z) has more 
than two nonzero coefficients, it has to be symmetric with one central coefficient 
at 21 — 1. Also, the end terms of P{z) have to be of an even index, so they cancel 
in P(z) — P{—z). The above two requirements lead to the symmetry and length 
constraints for cases (a) and (b). In addition, there is a degenerate case (c), of little 
practical interest, when P(z) has only two nonzero coefficients, 

P(z) = z~\l ± z 2N - l ~ 2 i), 

which leads to zeros at odd roots of ±1. Because these are distributed among Hq(z) 
and H\{—z) (rather than Hi(z)), the resulting filters will be a poor set of lowpass 
and highpass filters. 

Another result that we mentioned at the beginning of this section is: 

Proposition 3.12 

There are no two-channel perfect reconstruction, orthogonal filter banks, with 
filters being FIR, linear phase, and with real coefficients (except for the Haar 
filters). 

Proof 

We know from Theorem 3.8 that orthonormality implies that 

H p (z)Hl(z- 1 ) = I, 

which further means that 

H oa (z)H oa (z- 1 ) + Ho^Hoiiz' 1 ) = 1. (3.2.67) 

We also know that in orthogonal filter banks, the filters are of even length. Therefore, 
following Proposition 3.11, one filter is symmetric and the other one is antisymmetric. Take 
the symmetric one, Hq(z) for example, and use (3.2.64) 



H Q (z) = if o(2 2 ) + ^" 1 i?oi^ 2 ), 

= z-^Hoiz- 1 ) = z-^iHooiz-^ + zK 
= z- L+2 H 01 (z- 2 ) + z-\z- L+2 H QQ (z- 2 )) 



oiU" 2 ) 



This further means that the polyphase components are related as 

Hoo(z) = z~ L/2+1 H 01 (z- 1 ), H 01 (z) = z- L/2+1 H 00 (z- 1 ). (3.2.68) 

Substituting the second equation from (3.2.68) into (3.2.67) we obtain 

if oo (2) Haoiz' 1 ) - -. 
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However, the only FIR, real-coefficient polynomial satisfying the above is 

Hoo(z) = ~7k z ~ 1 - 

Performing a similar analysis for Hqi(z), we obtain that Hoi(z) = l/\/2z~ , which, in turn, 
means that 

Ho(z) = -Liz-^ + z- 2 "- 1 ), Hl (z) = Ho(-z), 

v2 

or, the only solution yields Haar filters (I = k = 0) or trivial variations thereof. 

We now shift our attention to design issues. 

Lattice Structure for Linear Phase Filters Unlike in the paraunitary case, there are no 
canonical factorizations for general matrices of polynomials. 7 But there are lattice 
structures that will produce, for example, linear phase perfect reconstruction filters 
[208, 321]. To obtain it, note that H p {z) has to satisfy (if the filters are of the same 
length) 

H p {z) = (J _j)-*-*-JM* _1 )'(j \)- ( 3 - 2 - 69 ) 

Here, we assume that Hi(z) = H^z 2 ) + z~ l Hn(z 2 ) in order to have causal filters. 
This is referred to as the linear phase testing condition (see Problem 3.9). Then, 
assume that H p {z) satisfies (3.2.69) and construct H Jz) as 



H Jz) = H p (zj i 



1 \ / 1 a 

z~ l [a 1 



It is then easy to show that H Jz) satisfies (3.2.69) as well. The lattice 

YK-l 



#„(--•) = ('[ _\ \ 



go ,-)(:? 



(3.2.70) 



with C = —(1/2) Yli^i (1/(1 — a D)> Produces length L = 2K symmetric (lowpass) 
and antisymmetric (highpass) filters leading to perfect reconstruction filter banks. 
Note that the structure is incomplete [321] and that \cti\ / 1. Again, just as in the 
paraunitary lattice, perfect reconstruction is structurally guaranteed within a scale 
factor (in the synthesis, replace simply ai by — a» and pick C = 1). 



7 There exist factorizations of polynomial matrices based on ladder steps [151], but they are not 
canonical like the lattice structure in (3.2.60). 
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Table 3.3 Impulse response coefficients for analysis and 
synthesis niters in two different linear phase cases. There 
is a factor of 1/16 to be distributed between hi[n] and 
gi[n], like {1/4,1/4} or {1/16,1} (the latter was used in 
the text). 



n 


h [n] 


hi[n] 


9o[n] 


9i N 


h [n] 


hi[n] 


5o N 


9i N 





1 


-1 


-1 


-1 


1 


-1 


-1 


-1 


1 


3 


-3 


3 


3 


2 


-2 


2 


2 


2 


3 


3 


3 


-3 


1 


6 


6 


-1 


3 


1 


1 


-1 


1 




-2 


2 




4 












-1 


-1 





Example 3.4 



Let us construct filters of length 4 where the lowpass has a maximum number of zeros at 
z — —1 (that is, the linear phase counterpart of the D2 filter). From the cascade structure, 



H p (z) = 



-1 



2(1 -a 2 ) 

-1 
2(1 -a 2 ) 



1 1 
-1 1 

1 + az~ a + z~ 



1 a 
a 1 



We can now find the filter Hq(z) as 



H (z) = H 00 {z 2 ) + z- l H m (z 2 ) = 



1 + az + az + z 
-2(1 -a 2 ) 



Because Ho(z) is an even-length symmetric filter, it has automatically a zero at a = — 1, 
or Ho(—l) — 0. Take now the first derivative of Ho(e ju ') at cu — it and set it to (which 
corresponds to imposing a double zero at z — — 1) 



dHoie 3 



(IlJ 



-1 



-(a -2a + 3) = 0, 



2(1 -a 2 ) 

leading to a = 3. Substituting this into the expression for Ho(z), we get 

1 



ffo(z) 



10 



(1 + 32 +3z + z~ 



£<! + .-)•. 



(3.2.71) 



which means that Ho(z) has a triple zero at 2 = — 1. The highpass filter is equal to 



#1(2) = ^6 (- 1 -3^" 1 +32 _2 +2- 3 ). 



(3.2.72) 



Note that det(H m (z)) = (1/8) z" 3 . Following (3.2.30-3.2.31), G (z) = 16z 3 H 1 (-z) and 
Gi(z) = — 16z 3 Ho( — z). A causal version simply skips the z 3 factor. Recall that the key 
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to perfect reconstruction is the product P(z) — Hq(z) ■ H\(—z) in (3.2.66), which equals in 
this case (using (3.2.71-3.2.72)) 



P ( 2 ) = 7^(- 1 + 9z ~ 1 + 16z ~ 3 + 9z ~ 4 



256 

that is, the same P(z) as in Example 3.2. One can refactor this P(z) into a different set of 
{Ho(z),H[(—z)}, such as, for example, 

P(z) = H' a (z) H[(-z) 

= ^-(1 + 2Z- 1 + z~ 2 ) 4(-l + 2Z" 1 + 6z~ 2 + 2z~ 3 - 2 " 4 ), 
16 16 

that is, odd-length linear phase lowpass and highpass filters with impulse responses 1/16 [1, 
2, 1] and 1/16 [-1, -2, 6, -2, -1], respectively. Table 3.3 gives impulse response coefficients 
for both analysis and synthesis filters for the two cases given above. 

The above example showed again the central role played by P(z) = Hq{z) -Hi{—z). 
In some sense, designing two-channel filter banks boils down to designing P(z)'s 
with particular properties, and factoring them in a particular way. 

If one relaxes the perfect reconstruction constraint, one can obtain some desir- 
able properties at the cost of some small reconstruction error. For example, popular 
QMF filters have been designed by Johnston [144] , which have linear phase and "al- 
most" perfect reconstruction. The idea is to approximate perfect reconstruction in 
a QMF solution (see (3.2.37)) as well as possible, while obtaining a good lowpass 
filter (the highpass filter H\{z) being equal to Hq(—z), is automatically as good as 
the lowpass). Therefore, define an objective function depending on two quantities: 
(a) stopband attenuation error of Hq{z) 

S = f \H (en\ 2 duj, 
and (b) reconstruction error 

E = r |2 - (^o(e^)) 2 + (# (e j(a;+7r) ))T du. 
Jo 

The objective function is 

= cS+(l-c)E, 

where c assigns the relative cost to these two quantities. Then, O is minimized 
using the coefficients of H${z) as free variables. Such filter designs are tabulated in 

[67, 144]. 
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Complementary Filters The following question sometimes arises in the design of 
filter banks: given an FIR filter Hq(z), is there a complementary filter H\{z) such 
that the filter bank allows perfect reconstruction with FIR filters? The answer is 
given by the following proposition which was first proven in [139]. We will follow 
the proof in [319]: 

Proposition 3.13 

Given a causal FIR filter Hq(z), there exists a complementary filter H\{z) 
if and only if the polyphase components of Hq(z) are coprime (except for 
possible zeros at z = oo). 

Proof 

From Proposition 3.6, we know that a necessary and sufficient condition for perfect FIR 
reconstruction is that det(H p (z)) be a monomial. Thus, coprimeness is obviously neces- 
sary, since if there is a common factor between Hoo(z) and Hoi(z), it will show up in the 
determinant. Sufficiency follows from the Euclidean algorithm or Bezout's identity: given 
two coprime polynomials a(z) and b(z), the equation a(z)p(z) + b(z)q(z) = c(z) has a unique 
solution (see, for example, [32]). Thus, choose c(z) — z~ k and then, the solution {p(z), q(z)} 
corresponds to the two polyphase components of H\(z). 

Note that the solution H\{z) is not unique [32, 319]. Also, coprimeness of 
#00 (2)) Hq\{z) is equivalent with Hq(z) not having any pair of zeros at locations a 
and —a. This can be used to prove that the filter Hq(z) = (1 + z _1 ) always has 
a complementary filter (see Problem 3.12). 

Example 3.5 

Consider the filter H (z) - (1 + z -1 ) 4 = 1 + 4z -1 + 6z~ 2 + 4z~ 3 + z~ 4 . It can be verified 
that its two polyphase components are coprime, and thus, there is a complementary filter. 
We will find a solution to the equation 

det(H p {z)) = Hoo(z) ■ Hn(z) - H m (z) ■ H w {z) = z~\ (3.2.73) 

with Hoo(z) - I + 62:" 1 +z~ 2 and H Q1 (z) = 4 + 4z -1 . The right side of (3.2.73) was chosen 
so that there is a linear phase solution. For example, 

Hw(z) = -^(1 + z- 1 ), H 11 (z) = i, 

is a solution to (3.2.73), that is, Hi(z) = (1 + 4z _1 + z 2 )/16. This of course leads to the 
same P(z) as in Examples 3.3 and 3.4. 

3.2.5 Filter Banks with MR Filters 

We will now concentrate on orthogonal filter banks with infinite impulse response 
(IIR) filters. An early study of IIR filter banks was done in [313], and further 
developed in [234] as well as in [269] for perfect reconstruction in the context of 
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image coding. The main advantage of such filter banks is good frequency selectivity 
and low computational complexity, just like in regular IIR filtering. However, this 
advantage comes with a cost. Recall that in orthogonal filter banks, the synthesis 
filter impulse response is the time-reversed version of the analysis filter. Now if 
the analysis uses causal filters (with impulse response going from to +oo), then 
the synthesis has anticausal filters. This is a drawback from the point of view of 
implementation, since in general anticausal IIR filters cannot be implemented unless 
their impulse responses are truncated. However, a case where anticausal IIR filters 
can be implemented appears when the signal to be filtered is of finite length, a case 
encountered in image processing [234, 269]. IIR filter banks have been less popular 
because of this drawback, but their attractive features justify a brief treatment as 
given below. For more details, the reader is referred to [133]. 

First, return to the lattice factorization for FIR orthogonal filter banks (see 
(3.2.60)). If one substitutes an allpass section 8 for the delay z~ l in (3.2.60), the 
factorization is still paraunitary. For example, instead of the diagonal matrix used 
in (3.2.60), take a diagonal matrix D{z) such that 

^) D (o-(T^ ) )( Jw r ) *<o)-'- 

where we have assumed that the coefficients are real, and have used two allpass 
sections (instead of 1 and z~ l ). What is even more interesting is that such a 
factorization is complete [84]. 

Alternatively, recall that one of the ways to design orthogonal filter banks is to 
find an autocorrelation function P{z) which is valid, that is, which satisfies 

P(z) + P(-z) = 2, (3.2.74) 

and then factor it into P(z) = Hq{z)Hq{z~ 1 ). This approach is used in [133] to 
construct all possible orthogonal filter banks with rational filters. The method goes 
as follows: 

First, one chooses an arbitrary polynomial R{z) and forms P (z) as 

p(z) - m*)R{z- 1 ) , 3 2 75) 

It is easy to see that this P{z) satisfies (3.2.74). Since both the numerator and the 
denominator are autocorrelations (the latter being the sum of two autocorrelations), 
P(z) is as well. It can be shown that any valid autocorrelation can be written as 
in (3.2.75) [133]. Then factor P(z) as H(z)H(z~ l ) and form the filter 

H (z) = A Ho (z) H(z), 



8 Remember that a filter H{e iuj ) is allpass if |_ff(e J ")| = c, c > 0, for all w. Here we choose 

c= 1. 
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where Ajj (z) is an arbitrary allpass. Finally choose 

H^z) = z^-'Hoi-z- 1 ) A Hl (z), (3.2.76) 

where Ajj^z) is again an arbitrary allpass. The synthesis filters are then 

G (z) = Hoiz- 1 ), G^z) = -H^z- 1 ). (3.2.77) 

The above construction covers the whole spectrum of possible solutions. For exam- 
ple, if R(z)R(z~ 1 ) is in itself a valid function, then 

R{z)R{z- l ) + R{-z)R{-z- 1 ) = 2, 

and by choosing Ah , Ah 1 to be pure delays, the solutions obtained by the above 
construction are FIR. 

Example 3.6 Butterworth Filters 

As an example, consider a family of IIR solutions constructed in [133]. It is obtained using 
the above construction and imposing a maximum number of zeros at z — — 1. Choosing 
R(z) = (1 + 2" 1 ) JV in (3.2.75) gives 

P ^ = ( z - 1 + i + V; ( -zV +2 -^ - H { z)H(z^. (3.2.78) 

These filters are the IIR counterparts of the Daubechies' filters given in Example 3.2. These 
are, in fact, the Nth order half-band digital Butterworth filters [211] (see also Example 2.2). 
That these particular filters satisfy the conditions for orthogonality was also pointed out 
in [269]. The Butterworth filters are known to be the maximally flat IIR filters of a given 
order. 

Choose N — 5, or P(z) equals 

(l + zfil + z- 1 ) 5 



P(z) 



10z 4 + 120z 3 + 252 + 120\z- 2 + Wz~ 4 ' 
In this case, we can obtain a closed form spectral factorization of P(z), which leads to 

H Q (z) = 1 + 5 -" 1 + lte " 2 + lte " 3+fe " 4 + Z " 5 , (3.2.79) 

H l{ z) = ,- ll - 5 - + 10z2 - 1023 + fe4 - 25 . (3.2.80) 

V ; V2(l + Wz 2 + 5z 4 ) V ^ 

For the purposes of implementation, it is necessary to factor Hi(z) into stable causal (poles 
inside the unit circle) and anticausal (poles outside the unit circle) parts. For comparison 
with earlier designs, where length-8 FIR filters were designed, we show in Figure 3.5(d) the 
magnitude responses of Ho{e iuj ) and Hi(e juJ ) for N — 4. The form of the P (z) is then 

p(z) _ *"*(! + *) 4 (l + 0* 



1 + 28z" 2 + 70z- 4 + 28z- 6 + z~ 
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As we pointed out in Proposition 3.12, there are no real FIR orthogonal sym- 
metric/antisymmetric filter banks. However, if we allow IIR filters instead, then 
solutions do exist. There are two cases, depending if the center of symmetry/anti- 
symmetry is at a half integer (such as in an even-length FIR linear phase filter) 
or at an integer (such as in the odd- length FIR case). We will only consider the 
former case. For discussion of the latter case as well as further details, see [133]. 

It can be shown that the polyphase matrix for an orthogonal, half-integer sym- 
metric/antisymmetric filter bank is necessarily of the form 



A(z) z~ l A(z- lx 

l A(z) z~ n A(z- 



Hp\ z ) — I J-nA(„\ „-nAI„-l^ 



where A(z)A(z 1 ) = 1, that is, A(z) is an allpass filter. Choosing I = n = gives 
H (z) = A{z 2 ) + z~ 1 A{z~ 2 ), H^z) = -A(z 2 ) + z- l A(z- 2 ), (3.2.81) 
which is an orthogonal, linear phase pair. For a simple example, choose 

A(z) - l + fo_1 + (15/7) "" 2 (3 2 82) 

A[Z) ~ (15/7) + 6z-i + z~* ■ ( j 

This particular solution will prove useful in the construction of wavelets (see Sec- 
tion 4.6.2). Again, for the purposes of implementation, one has to implement stable 
causal and anticausal parts separately. 

Remarks The main advantage of IIR filters is their good frequency selectivity and 
low computational complexity. The price one pays, however, is the fact that the 
filters become noncausal. For the sake of discussion, assume a finite-length signal, 
and a causal analysis filter, which will be followed by an anticausal synthesis filter. 
The output will be infinite even though the input is of finite length. One can take 
care of this problem in two ways. Either one stores the state of the filters after 
the end of the input signal and uses this as an initial state for the synthesis filters 
[269] , or one takes advantage of the fact that the outputs of the analysis filter bank 
decay rapidly after the input is zero, and stores only a finite extension of these 
signals. While the former technique is exact, the latter is usually a good enough 
approximation. This short discussion indicates that the implementation of IIR filter 
banks is less straightforward than that of their FIR counterparts, and explains their 
lesser popularity. 

3.3 Tree-Structured Filter Banks 

An easy way to construct multichannel filter banks is to cascade two-channel banks 
appropriately. One case can be seen in Figure 3.7(a), where frequency analysis is 
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stage J 



(b) 



Figure 3.7 An octave-band filter bank with J stages. Decomposition spaces 
Vi, Wi are indicated. If hi[n] is an orthogonal filter, and gi[n] = hi[— n], the 
structure implements an orthogonal discrete-time wavelet series expansion, (a) 
Analysis part, (b) Synthesis part. 



obtained by simply iterating a two-channel division on the previous lowpass channel. 
This is often called a constant- Q or constant relative bandwidth filter bank since the 
bandwidth at each channel, divided by its center frequency, is constant. It is also 
sometimes called a logarithmic filter bank since the channels are equal bandwidth 
on a logarithmic scale. We will call it an octave-band filter bank since each successive 
highpass output contains an octave of the input bandwidth. Another case appears 
when 2 J equal bandwidth channels are desired. This can be obtained by a J-step 
subdivision into 2 channels, that is, the two-channel bank is now iterated on both 
the lowpass and highpass channels. This results in a tree with 2 J leaves, each 
corresponding to (1/2 )th of the original bandwidth, with a downsampling by 2 . 
Another possibility is building an arbitrary tree-structured filter bank, giving rise 
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to wavelet packets, discussed later in this section. 

3.3.1 Octave-Band Filter Bank and Discrete-Time Wavelet Series 

Consider the filter bank given in Figure 3.7. We see that the signal is split first via a 
two-channel filter bank, then the lowpass version is split again using the same filter 
bank, and so on. It will be shown later that this structure implements a discrete- 
time biorthogonal wavelet series (we assume here that the two-channel filter banks 
are perfect reconstruction). If the two-channel filter bank is orthonormal, then it 
implements an orthonormal discrete-time wavelet series. 9 

Recall that the basis functions of the discrete-time expansion are given by the 
impulse responses of the synthesis filters. Therefore, we will concentrate on the 
synthesis filter bank (even though, in the orthogonal case, simple time reversal 
relates analysis and synthesis filters). Let us start with a simple example which 
should highlight the main features of octave-band filter bank expansions. 

Example 3.7 

Consider what happens if the filters gi[n] from Figure 3.7(a)-(b) are Haar filters defined in 
^-transform domain as 



G (z) = -^(1 + z- 1 ), G,(z) = -^(1-z- 1 ). 

Take, for example, J — 3, that is, we will use three two-channel filter banks. Then, using 
the multirate identity which says that G(z) followed by upsampling by 2 is equivalent to 
upsampling by 2 followed by G(z 2 ) (see Section 2.5.3), we can transform this filter bank 
into a four-channel one as given in Figure 3.8. The equivalent filters are 

G™(z) = G 1 (z) = -^(l-z- 1 ), 

Gf\z) = G (z)G 1 (z 2 ) = l(i + z- 1 - z - 2 - z -% 
G?\z) = G (z) Go(z 2 ) £?!(/) 

1 ,-. . -1 . -2 . -3 -4 -5 -6 -7\ 

= — = l + z + z + z — z — z — z — z ), 
G^(z) = G (z) G (z 2 ) Go( 2 4 ) 

1 /, . -1 . -2 . -3 . -4 . -5 . -6 . -7\ 

— — =(l + z +z +z +z +z +z +z j, 

2V2 

preceded by upsampling by 2, 4, 8 and 8 respectively. The impulse responses follow by 
inverse z-transform. Denote by <7q [n] the equivalent filter obtained by going through three 



This is also sometimes called a discrete-time wavelet transform in the literature. 
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*tHii (1 '- 1) 




Figure 3.8 Octave-band synthesis filter bank with Haar filters and three stages. 
It is obtained by transforming the filter bank from Figure 3.7(b) using the mul- 
tirate identity for filtering followed by upsampling. 



stages of lowpass filters go [n] each preceded by upsampling by 2. It can be defined recursively 
as (we give it in ^-domain for simplicity) 



G ( a) (z) = G (z 22 )G 2) (z) = Y{Go(2 



Note that this implies that G (z) — Go(z). On the other hand, we denote by g-f [n], the 
equivalent filter corresponding to highpass filtering followed by (i — 1) stages of lowpass 
filtering, each again preceded by upsampling by 2. It can be defined recursively as 



G? } (z) = G 1 (z 22 )G 2) (z) = Gr(z 2 ')i[G (z ah ), j = 1,2,3. 



Since this is an orthonormal system, the time-domain matrices representing analysis and 
synthesis are just transposes of each other. Thus the analysis matrix T a representing the 
actions of the filters h\ [n], h[ [n], h[ [n], h [n] contains as lines the impulse responses 
of g[ [n], g\ [n], g\ [n], and g^ [n] or of h\ [—n\ since analysis and synthesis filters are 
linked by time reversal. The matrix T a is block-diagonal, 



(■■ 



T a = 



A 



A 



(3.3.1) 
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where the block An is of the following form: 



An = 



2V2 



( 2 


-2 
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-2 
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-2 
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-2 


V2 


sfl 


-y/2 


-V2 


























v^ 


^2 


-V2 


-V2 
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-1 


-1 


-1 


-1 


V 1 


1 


1 


1 


1 


1 


1 


1 / 



(3.3.2) 



Note how this matrix reflects the fact that the filter g\ [n] is preceded by upsampling by 
2 (the row (2 —2) is shifted by 2 each time and appears 4 times in the matrix). g\ [n] 
is preceded by upsampling by 4 (the corresponding row is shifted by 4 and appears twice), 
while filters in g[ [n] , g [n] are preceded by upsampling by 8 (the corresponding rows 
appear only once in the matrix). Note that the ordering of the rows in (3.3.2) is somewhat 
arbitrary; we simply gathered successive impulse responses for clarity. 



Now that we have seen how it works in a simple case, we take more general 
filters gi[n], and a number of stages J. We concentrate on the orthonormal case 
(the biorthogonal one would follow similarly). In an orthonormal octave-band filter 
bank with J stages, the equivalent filters (basis functions) are given by (again we 
give them in z-domain for simplicity) 



<#>(*) = Gtf-%) Goi**- 1 ) 



G?{z) 



.O-i)/ 



G^'(z) Gl {z* ') 
3 = 1,..., J. 



nGo(^), 

K=0 

G^ 1 ) n G (z* K ), 

K=0 



(3.3.3) 



(3.3.4) 



In time domain, each of the outputs in Figure 3.7(a) can be described as 



Hi H 



i-i 



x. 



1,...,J-1 



except for the last, which is obtained by 



Hix. 

Here, the time-domain matrices Hq, Hi are as defined in Section 3.2.1, that is, 
each line is an even shift of the impulse response of g%\p\, or equivalently, of hi[— n]. 
Since each stage in the analysis bank is orthonormal and invertible, the overall 
scheme is as well. Thus, we get a unitary analysis matrix T a by interleaving the 



rows of H\, H\Hq, 



HiH 



J-i 




H Q, 



as was done in (3.3.1-3.3.2). A formal 



proof of this statement will be given in Section 3.3.2 under orthogonality of basis 
functions. 
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Example 3.8 



HiHl Hi as 



Let us go back to the Haar case and three stages. We can form matrices Hi, H1H0, 
( : : : : \ 



H 1 = —= 

V2 



H = 



H iH o 



V 



V2 



1-10 
1-1 



110 
11 



/ 



11-1-10 
11-1-1 



(3.3.5) 



(3.3.6) 



(3.3.7) 



H\H Q 



1 

2V2 



1 

2V2 



V 



1111-1-1-1-100 

00000 Oil 
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1 


1 


1 


1 


1 
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1 
































1 


1 



) 



(3.3.8) 



(3.3.9) 



Now, it is easy to see that by interleaving (3.3.5-3.3.9) we obtain the matrix T a as in (3.3.1- 
3.3.2). To check that it is unitary, it is enough to check that Aq is unitary (which it is, just 
compute the product AqAq ). 



Until now, we have concentrated on the orthonormal case. If one would relax 
the orthonormality constraint, we would obtain a biorthogonal tree-structured filter 
bank. Now, hi[n] and gi[n] are not related by simple time reversal, but are impulse 
responses of a biorthogonal perfect reconstruction filter bank. We therefore have 



,«)r 



■Vk],g% 



U) 



n - 



2 J k\ as given in (3.3.3-3.3.4) 



both equivalent synthesis filters g^' [n 
and analysis filters nf [n — 2^'fc], /iq [n— 2 J k], which are defined similarly. Therefore 
if the individual two-channel filter banks are biorthogonal (perfect reconstruction), 
then the overall scheme is as well. The proof of this statement will follow the proof 
for the orthonormal case (see Section 3.3.2 for the discrete-time wavelet series case), 
and is left as an exercise to the reader. 
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3.3.2 Discrete-Time Wavelet Series and Its Properties 

What was obtained in the last section is called a discrete-time wavelet series. It 
should be noted that this is not an exact equivalent of the continuous-time wavelet 
transform or series discussed in Chapter 4. In continuous time, there is a single 
wavelet involved, whereas in the discrete-time case, there are different iterated 
filters. 

At the risk of a slight redundancy, we go once more through the whole process 
leading to the discrete-time wavelet series. Consider a two-channel orthogonal filter 
bank with filters ho[n], hi[n], go[n] and gi[n], where hi[n] = g,i[—n\. Then, the input 
signal can be written as 



x\n\ = > -A K ' UK + 1 q\ ' \n 



(1)[ - 2 1 k] + Y^X^[2k]g^ ) [n-2 1 k], (3.3.10) 



where 



k&Z 




k&Z 




xW[2k] = 


= {h { ^[2 l k-l],x 




lW[2fc + l] = 


= {h^^k-l^x 



are the convolutions of the input with ho[n] and h\\n\ evaluated at even indexes 
2k. In these equations h\ [n] = hi[n], and gj '[n] = gi[n]. In an octave-band 
filter bank or discrete-time wavelet series, the lowpass channel is further split by 
lowpass/highpass filtering and downsampling. Then, the first term on the right side 
of (3.3.10) remains unchanged, while the second can be expressed as 



/here 



(3.3.11) 



J2x {1) [2k] h { ^[2 l k-n} 


= Y J x{2) [ 2k + l \af ) [n-2 2 k] 


k&Z 


k(^Z 




+ J2x (2) m9 { o ) [n-2 2 k], 




k&Z 


X^[2k] 


= (h { V[2 2 k-l},x[l}), 


X {2) [2k + l] 


= (h?[2 2 k-l],x[l]), 



that is, we applied (3.3.10) once more. In the above, basis functions <r 4 '[n] are as 

(2) 

defined in (3.3.3) and (3.3.4). In other words, (/q [n] is the time-domain version of 



G£\z) = G (z) G (z 2 ), 



,( 2 ) 



while g\ [n] is the time-domain version of 



G?\z) = G (z) G^z 2 ) 
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1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 



g \ 2) 
g [ 4) 



Figure 3.9 Dyadic sampling grid used in the discrete-time wavelet series. The 
shifts of the basis functions g\ are shown, as well as #g (case J = 4 is shown). 
This corresponds to the "sampling" of the discrete-time wavelet series. Note 
the conservation of the number of samples between the signal and transform 
domains. 



With (3.3.11), the input signal x[n] in (3.3.10) can be written as 

J2 X (1) [2fc + 1] 9? [n - 2 l k) + Y^ X (2) [2k + 1] gf ] [n - 2 2 k] 

k€Z k&Z 

+ J2 X(2) \ 2k \ 9o ] [2 2fc " n \ ■ (3.3.12) 



x n 



k&Z 



Repeating the process in (3.3.12) J times, one obtains the discrete-time wavelet 
series over J octaves, plus the final octave containing the lowpass version. Thus, 
(3.3.12) becomes 



:[n] = J2J2x^[2k + l}g[ 3] [n-2 j k} + ^ X {J \2k] g { Q J} [n - 2 J k], (3.3.13) 

j=l k&Z k&Z 



where 



X^[2k + l) 
X^[2k] 



(h[ j) [2 j k-l},x[l 
(h { J) [2 J k-l],x[ 



1, . . . , J, 



(3.3.14) 



In (3.3.13) the sequence g^ '[n] is the time-domain version of (3.3.4), while g$ [n] 
is the time-domain version of (3.3.3) and hf [n] = gf [— n\. Because any input 
sequence can be decomposed as in (3.3.13), the family of functions {g\ [2^k — 
n],gfg [2 J k — n]}, j = 1, . . . , J, and k, n <E Z, is an orthonormal basis for hiZ). 

Note the special sampling used in the discrete-time wavelet series. Each sub- 
sequent channel is downsampled by 2 with respect to the previous one and has a 
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bandwidth that is reduced by 2 as well. This is called a dyadic sampling grid, as 
shown in Figure 3.9. 

Let us now list a few properties of the discrete-time wavelet series (orthonormal 
and dyadic). 

Linearity Since the discrete-time wavelet series involves inner products or convo- 
lutions (which are linear operators) it is obviously linear. 

Shift Recall that multirate systems are not shift-invariant in general, and two- 
channel filter banks downsampled by 2 are shift-invariant with respect to even 
shifts only. Therefore, it is intuitive that a J-octave discrete-time wavelet series 
will be invariant under shifts by multiples of 2 . A visual interpretation follows 
from the fact that the dyadic grid in Figure 3.9, when moved by k2 J , will overlap 
with itself, whereas it will not if the shift is a noninteger multiple of 2 . 

Proposition 3.14 

In a discrete-time wavelet series expansion over J octaves, if 

x[l] i — >X u) [2k + l], i = 1,2,... , J 

then 

x[l-m2 J ] < — ► X^[2(k-m2 J - j ) + l}. 

Proof 

If y[l] = x[l — m2 J ], then its transform is, following (3.3.14), 

Y (j) [2k + 1] = (h[ i) [2 j k-l],xll-m2 J ]) 
= (h^l^k-l' -m2 J ],x[l']} 
= X u) [2 3 {k-m2 J - 3 ) + l]. 

Very similarly, one proves for the lowpass channel that, when x[l] produces X^ ' [2k] , 
then x[l - m2 J ] leads to X^[2(k - m)]. 

Orthogonality We have mentioned before that #q [n] and g\ [n], j = 1, . . . , J, with 
appropriate shifts, form an orthonormal family of functions (see [274]). This stems 
from the fact that we have used two-channel orthogonal filter banks, for which we 
know that 

( 9i [n-2k], 9j [n-2l]) = 5[i - j] S[k - I]. 
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Proposition 3.15 

In a discrete-time wavelet series expansion, the following orthogonality rela- 
tions hold: 

(g { J) [n-2 J k],g { J) [n-2 J l}) = S[k - 1], (3.3.15) 

(g? ) [n-2?k],gP[n-2 i l\) = S[i - j] 6[k - I], (3.3.16) 

(g ( J) [n-2 J k},g[ j) [n-2H}) = 0. (3.3.17) 



Proof 



We will here prove only (3.3.15), while (3.3.16) and (3.3.17) are left as an exercise to the 
reader (see Problem 3.15). We prove (3.3.15) by induction. 

It will be convenient to work with the ^-transform of the autocorrelation of the filter 
Gq (z), which we call P^'(z) and equals 

P ( "(z) = G<"(z)G("(z- 1 ). 

Recall that because of the orthogonality of go [n] with respect to even shifts, we have that 

P w (z) + P w (-z) = 2, 

or, equivalently, that the polyphase decomposition of P^ 1 ' (z) is of the form 

P (1) (z) = 1 + zP^iz 2 ). 

This is the initial step for our induction. Now, assume that g J [n] is orthogonal to its 
translates by 2 J . Therefore, the polyphase decomposition of its autocorrelation can be 
written as 

23-1 

j=i 
Now, because of the recursion (3.3.3), the autocorrelation of G^ + 1 '(z) equals 

P u+1) (z) = P U) (z)P w (z ai ). 

Expanding both terms on the right-hand side, we get 



( z) = [l+J2^Pl 3) (z 23 )\ 1 + ^V ) 



We need to verify that the 0th polyphase component of P^ +1 '(z) is equal to 1, or that 
coefficients of z's which are raised to powers multiple of 2 J + 1 are 0. Out of the four products 
that appear when multiplying out the above right-hand side, only the product involving the 
polyphase components needs to be considered, 
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The powers of z appearing in the above product are of the form I — i + k2 j + 2 J + m2 3+1 , 
where i — • • • V — 1 and k,m £ Z. Thus, I cannot be a multiple of 2 J + 1 , and we have 
shown that 

P*+ 1 (z) = l+ J2 * i ^ 0+1) (« a ' +1 ). 

thus completing the proof. 

Parseval's Equality Orthogonality together with completeness (which follows from 
perfect reconstruction) leads to conservation of energy, also called Bessel's or Par- 
seval's equality, that is 

J 
IkNf = ^(|^)[2A:]| 2 + £ \X U) [2k + 1]| 2 ). 

k€Z j=l 

3.3.3 Multiresolution Interpretation of Octave-Band Filter Banks 

The two-channel filter banks studied in Sections 3.1 and 3.2 have the property 
of splitting the signal into two lower-resolution versions. One was a lowpass or 
coarse resolution version, and the other was a highpass version of the input. Then, 
in this section, we have applied this decomposition recursively on the lowpass or 
coarse version. This leads to a hierarchy of resolutions, also called a multiresolution 
decomposition. 

Actually, in computer vision as well as in image processing, looking at signals at 
various resolutions has been around for quite some time. In 1983, Burt and Adelson 
introduced the pyramid coding technique, that builds up a signal from its lower- 
resolution version plus a sequence of details (see also Section 3.5.2) [41]. In fact, one 
of the first links between wavelet theory and signal processing was Daubechies' [71] 
and Mallat's [180] recognition that the scheme of Burt and Adelson is closely related 
to wavelet theory and multiresolution analysis, and that filter banks or subband 
coding schemes can be used for the computation of wavelet decompositions. While 
these relations will be further explored in Chapter 4 for the continuous-time wavelet 
series, here we study the discrete-time wavelet series or its octave-band filter bank 
realization. This discrete-time multiresolution analysis was studied by Rioul [240]. 

Since this is a formalization of earlier concepts, we need some definitions. First 
we introduce the concept of embedded closed spaces. We will say that the space V$ 
is the space of all square-summable sequences, that is, 

V = 1 2 {Z). (3.3.18) 

Then, a multiresolution analysis consists of a sequence of embedded closed spaces 

Vj C • • • C V 2 C Fi C V . (3.3.19) 
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It is obvious that due to (3.3.18-3.3.19) 

J 
\JVj = Vq = 1 2 {Z}. 

3=0 

The orthogonal complement of Vj+i in Vj will be denoted by H^+i, and thus 

Vj = V j+1 ®W j+1 , (3.3.20) 

with Vj + i _L Wj+i, where © denotes the direct sum (see Section 2.2.2). Assume 
that there exists a sequence go[n] £ Vq such that 

{go [n- 2k]} k€Z 

is a basis for V±. Then, it can be shown that there exists a sequence g\[n\ £ V such 
that 

{gi[n- 2k]} k€Z 

is a basis for W\. Such a sequence is given by 

gi [n] = (-l) n g [-n+l]. (3.3.21) 

In other words, and having in mind (3.3.20), {go[ n — 2k],gi[n — 2k]} k £Z is an 
orthonormal basis for Vq. This splitting can be iterated on V\. Therefore, one can 
see that Vq can be decomposed in the following manner: 

V Q = Wi® W 2 ®---® Wj®Vj, (3.3.22) 

by simply iterating the decomposition J times. 

Now, consider the octave-band filter bank in Figure 3.7(a). The analysis filters 
are the time-reversed versions of <?o[ n ] an d <7i[w]- Therefore, the octave-band analy- 
sis filter bank computes the inner products with the basis functions for W\, W2, ■ ■ ■ , 
Wj and Vj. 

In Figure 3.7(b), after convolution with the synthesis filters, we get the orthog- 
onal projection of the input signal onto W\, W2, ■ ■ ■ , Wj and Vj. That is, the input 
is decomposed into a very coarse resolution (which exists in Vj) and added details 
(which exist in the spaces Wi, i = 1, . . . , J). By (3.3.22), the sum of the coarse 
version and all the added details yields back the original signal; a result that follows 
from the perfect reconstruction property of the analysis/synthesis system as well. 

We will call Vj-'s approximation spaces and Wj's detail spaces. Then, the pro- 
cess of building up the signal is intuitively very clear — one starts with its lower- 
resolution version belonging to Vj, and adds up the details until the final resolution 
is reached. 
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Figure 3.10 Ideal division of the spectrum by the discrete-time wavelet series 
using sine filters. Note that the spectrums are symmetric around zero. Division 
into Vi spaces (note how Vi C Vj_i), and resulting Wi spaces. (Actually, Vj 
and Wj are of height 2 J ' 2 , so they have unit norm) . 



It will be seen in Chapter 4 that the decomposition into approximation and 
detail spaces is very similar to the multiresolution framework for continuous-time 
signals. However, there are a few important distinctions. First, in the discrete-time 
case, there is a "finest" resolution, associated with the space Vo, that is, one cannot 
refine the signal further. Then, we are considering a finite number of decomposition 
steps J, thus leading to a "coarsest" resolution, associated with Vj. Finally, in 
the continuous-time case, a simple function and its scales and translates are used, 
whereas here, various iterated filters are involved (which, under certain conditions, 
resemble scales of each other as we will see) . 



Example 3.9 Sine Case 

In the sine case, introduced in Section 3.1.3, it is very easy to spot the multiresolution 
flavor. Since the filters used are ideal lowpass/highpass filters, respectively, at each stage 
the lowpass filter would halve the coarse space, while the highpass filter would take care 
of the difference between them. The above argument is best seen in Figure 3.10. The 
original signal (discrete in time and thus its spectrum occupies (— 7r,7r)) is lowpass filtered 
using the ideal half-band filter. As a result, starting from the space Vo, we have derived 
a lower-resolution signal by halving Vo , resulting in V\ . Then, an even coarser version is 
obtained by using the same process, resulting in the space V2. Using the above process 
repeatedly, one obtains the final coarse (approximation) space Vj. Along the way we have 
created difference spaces, Wi, as well. 

For example, the space Vi occupies the part (— 7r/2,7r/2) in the spectrum, while W\ 
will occupy (— 7r, — 7r/2) U (-k/2,tv). It can be seen that go[n] as defined in (3.1.23) with its 
even shifts, will constitute a basis for V\, while gi[n] following (3.3.21) constitutes a basis 
for W\. In other words, go[?d\ <7i M an d their even shifts would constitute a basis for the 
original (starting) space Vo (h(Z)). 
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Figure 3.1 1 All possible combinations of tree-structured filter banks of depth 
2. Symbolically, a fork stands for a two-channel filter bank with the lowpass 
on the bottom. From left to right is the full tree (STFT like), the octave-band 
tree (wavelet), the tree where only the highpass is split further, the two-band 
tree and finally the nil-tree tree (no split at all). Note that all smaller trees 
are pruned versions of the full tree. 



Because we deal with ideal niters, there is an obvious frequency interpretation. How- 
ever, one has to be careful with the boundaries between intervals. With our definition of 
go[n] and <ji[n], cos((7r/2)n) 10 belongs to Vi while sin((7r/2)n) belongs to W\. 



3.3.4 General Tree-Structured Filter Banks and Wavelet Packets 

A major part of this section was devoted to octave-band, tree-structured filter 
banks. It is easy to generalize that discussion to arbitrary tree structures, starting 
from a single two-channel filter bank, all the way through the full grown tree of 
depth J. Consider, for example, Figure 3.11. It shows all possible tree structures 
of depth less or equal to two. 

Note in particular the full tree, which yields a linear division of the spectrum sim- 
ilar to the short-time Fourier transform, and the octave-band tree, which performs 
a two-step discrete-time wavelet series expansion. Such arbitrary tree structures 
were recently introduced as a family of orthonormal bases for discrete-time signals, 
and are known under the name of wavelet packets [63]. The potential of wavelet 
packets lies in the capacity to offer a rich menu of orthonormal bases, from which 
the "best" one can be chosen ("best" according to a particular criterion). This 
will be discussed in more detail in Chapter 7 when applications in compression are 
considered. What we will do here, is define the basis functions and write down 
the appropriate orthogonality relations; however, since the octave-band case was 
discussed in detail, the proofs will be omitted (for a proof, see [274]). 



10 To be precise, since cos((-7r/2)n) is not of finite energy and does not belong to h(Z), one needs 
to define windowed versions of unit norm and take appropriate limits. 
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Denote the equivalent niters by gf [n], i = 0, . . . , 2 J — 1. In other words, g\ is 
the ith equivalent filter going through one of the possible paths of length j. The 
ordering is somewhat arbitrary, and we will choose the one corresponding to a full 
tree with a lowpass in the lower branch of each fork, and start numbering from the 
bottom. 

Example 3.10 

Let us find all equivalent niters in Figure 3.11, or the niters corresponding to depth-1 and 
depth-2 trees. Since we will be interested in the basis functions, we consider the synthesis 
filter banks. For simplicity, we do it in z-domain. 

GP(z) = G (z), G[ 1} (z) = Gi(z), 

G (2) (z) = G (z)G (z 2 ), Gf\z) = G Q (z)G 1 (z 2 ), (3.3.23) 

G (2 \z) = Gi(z) G (z 2 ), Gf\z) = Gi(z)Gi(z a ). (3.3.24) 

Note that with the ordering chosen in (3.3.23-3.3.24), increasing index does not always cor- 
respond to increasing frequency. It can be verified that for ideal filters, G 2 (e^ w ) chooses 
the range [37r/4, 7r], while G 3 (e J ") covers the range [7r/2, 37r/4] (see Problem 3.16). Be- 
side the identity basis, which corresponds to the no-split situation, we have four possible 
orthonormal bases, corresponding to the four trees in Figure 3.11. Thus, we have a family 
W = {Wo,W 1 ,W 2 ,W 3 ,W i }, where W 4 is simply {S[n - k]} k€Z . 

Wo = {9 < 2) [n-2 2 k},g{ 2) [n-2 2 k],g! 2 2) [n-2 2 k],gi 2) [n-2 2 k]} k€Z , 

corresponds to the full tree. 

Wl = { g ^[n-2k],g i2) [n-2 2 k],g[ 2) [n-2 2 k]} k€Z , 

corresponds to the octave-band tree. 

W 2 = {g ( 1) [n-2k],g! 2 2) [n-2 2 k},gi 2) [n-2 2 k}} k€Z , 

corresponds to the tree with the highband split twice, and 

W 3 = {g 0) [n-2k],g[ 1) [n-2k]} k€Z , 

is simply the usual two-channel filter bank basis. 

This small example should have given the intuition behind orthonormal bases 
generated from tree-structured filter banks. In the general case, with filter banks of 
depth J, it can be shown that, counting the no-split tree, the number of orthonormal 
bases satisfies 

Mj = M 2 j_ x + 1. (3.3.25) 

Among this myriad of bases, there are the STFT-like basis, given by 

Wo = {g { J) [n-2 J k],...,g^ J ) _ l [n-2 J k]} k ^ z , (3.3.26) 
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and the wavelet-like basis, 
W 1 = {g^[n-2k},g? ) [n-2 2 k],...,g[ J) [n-2 J k},g { J) [n-2 J k}} keZ . (3.3.27) 

It can be shown that the sets of basis functions in (3.3.26) and (3.3.27), as well as 
in all other bases generated by the filter bank tree, are orthonormal (for example, 
along the lines of the proof in the discrete-time wavelet series case). However, this 
would be quite cumbersome. A more immediate proof is sketched here. Note that 
we have a perfect reconstruction system by construction, and that the synthesis 
and the analysis filters are related by time reversal. That is, the inverse operator 
of the analysis filter bank (whatever its particular structure) is its transpose, or 
equivalently, the overall filter bank is orthonormal. Therefore, the impulse responses 
of all equivalent filters and their appropriate shifts form an orthonormal basis for 
h{Z). 

It is interesting to consider the time-frequency analysis performed by various 
filter banks. This is shown schematically in Figure 3.12 for three particular cases 
of binary trees. Note the different trade-offs in time and frequency resolutions. 

Figure 3.13 shows a dynamic time- frequency analysis, where the time and fre- 
quency resolutions are modified as time evolves. This is achieved by modifying the 
frequency split on the fly [132], and can be used for signal compression as discussed 
in Section 7.3.4. 

3.4 Multichannel Filter Banks 

In the previous section, we have seen how one can obtain multichannel filter banks 
by cascading two-channel ones. Although this is a very easy way of achieving 
the goal, one might be interested in designing multichannel filter banks directly. 
Therefore, in this section we will present a brief analysis of iV-channel filter banks, 
as given in Figure 3.14. We start the section by discussing two special cases which 
are of interest in applications: the first, block transforms, and the second, lapped 
orthogonal transforms. Then, we will formalize our treatment of ./V-channel filter 
banks (time-, modulation- and polyphase-domain analyses). Finally, a particular 
class of multichannel filter banks, where all filters are obtained by modulating a 
single, prototype filter — called modulated filter banks — is presented. 

3.4.1 Block and Lapped Orthogonal Transforms 

Block Transforms Block transforms, which are used quite frequently in signal 
compression (for example, the discrete cosine transform), are a special case of filter 
banks with iV channels, filters of length N, and downsampling by N. Moreover, 
when such transforms are unitary or orthogonal, they are the simplest examples 
of orthogonal (also called paraunitary or lossless) iV-channel filter banks. Let us 
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(a) 



if 



(b) 





(c) 




Figure 3.12 Time-frequency analysis achieved by different binary subband 
trees. The trees are on bottom, the time-frequency tilings on top. (a) Full tree 
or STFT. (b) Octave-band tree or wavelet series, (c) Arbitrary tree or one 
possible wavelet packet. 



analyze such filter banks in a manner similar to Section 3.2. Therefore, the channel 
signals, after filtering and sampling can be expressed as 



2/o [0] 
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VN-l[0] 




yo[i] 
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yjv-i[i] 




: ) 





■■■ A 






x[0] 


• • ■ A 






x[l] 
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Figure 3.13 Dynamic time-frequency analysis achieved by concatenating the 
analyses from Figure 3.12. The tiling and the evolving tree are shown. 







Figure 3.14 ^-channel analysis/synthesis filter bank with critical downsampling by N. 



where the block Arj is equal to (similarly to (3.2.3)) 

ho[N-l] ••• ho[0] \ / ff0 [0] 



9o[N - 1] 



.h N -i[N-l] ••• h N -i[0]/ \g N -i[0] ■■■ g N -i[N~\] 

The second equality follows since the transform is unitary, that is 



A Al = Al A = I. 



(3.4.2) 
(3.4.3) 



We can see that (3.4.2-3.4.3) imply that 

IMkN - n],hj[lN - n]) = { 9i [n - kN], 9j [n - IN}) = S[i - j] S[k - I], 
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that is, we obtained the ortho-normality relations for this case. Denoting by 
VkN+i[ n ] = 9i[ n ~ kN], we have that the set of basis functions {^fcTV+ib 7 -]} = 
{do [ n — kN],g\[n — kN], . . . ,gjv_i[n — kN]}, with i = 0, . . . , N — 1, and k £ Z, is 
an orthonormal basis for faiZ)- 

Lapped Orthogonal Transforms Lapped orthogonal transforms (LOT's), intro- 
duced by Cassereau [43] and Malvar [189, 188] are a class of iV-channel unitary filter 
banks where some additional constraints are imposed. In particular, the length of 
the filters is restricted to L = 2N, or twice the number of channels (or down- 
sampling rate), and thus, it is easy to interpret LOT's as an extension of block 
transforms where neighboring filters overlap. Usually, the number of channels is 
even and sometimes they are all obtained from a single prototype window by mod- 
ulation. In this case, fast algorithms taking advantage of the modulation relation 
between the filters reduce the order TV 2 operations per N outputs of the filter bank 
to cN log 2 N (see also Chapter 6). This computational efficiency, as well as the 
simplicity and close relationship to block transforms, has made LOT's quite pop- 
ular. A related class of filter banks, called time-domain aliasing cancellation filter 
banks, studied by Princen and Bradley [229] can be seen as another interpretation 
of LOT's. For an excellent treatment of LOT's, see the book by Malvar [188], to 
which we refer for more details. 

Let us examine the lapped orthogonal transform. First, the fact that the filter 
length is 2N, means that the time-domain matrix analogous to the one in (3.4.1), 
has the following form: 

/ ; ; ; ; \ 

■■ A Q A 1 ••• 

• A A 1 ••■ 

V \ \ \ \ J 

that is, it has a double block diagonal. The fact that T a is orthogonal, or T a T a = 
T T a T a = I, yields 

A^A + AjA 1 = AoA^ + AiAj = I, (3.4.5) 

as well as 

A^Ai = AjA = 0, A Aj = A x Al = 0. (3.4.6) 

The property (3.4.6) is called orthogonality of tails since overlapping tails of the basis 
functions are orthogonal to each other. Note that these conditions characterize 
nothing but an TV-channel orthogonal filter bank, with filters of length 2N and 
downsampling by N. To obtain certain classes of LOT's, one imposes additional 
constraints. For example, in Section 3.4.3, we will consider a cosine modulated 
filter bank. 



(3.4.4) 
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Generalizations What we have seen in these two simple cases, is how to obtain 
./V-channel filter banks with filters of length N (block transforms) and filters of 
length 2N (lapped orthogonal transforms). It is obvious that by allowing longer 
filters, or more blocks A{ in (3.4.4), we can obtain general iV-channel filter banks. 

3.4.2 Analysis of Multichannel Filter Banks 

The analysis of iV-channel filter banks is in many ways analogous to that of two- 
channel filter banks; therefore, the treatment here will be fairly brisk, with refer- 
ences to Section 3.2. 

Time-Domain Analysis We can proceed here exactly as in Section 3.2.1. Thus, 
we can say that the channel outputs (or transform coefficients) in Figure 3.14 can 
be expressed as in (3.2.1) 

y = x = T a x, 

where the vector of transform coefficients is X, with X[Nk + i] = yi[k]. The analysis 
matrix T a is given as in (3.2.2) with blocks Ai of the form 

/ ho[Nk-l-Ni] ••• h [Nk-N-Ni] 
A % = : : 

\h N -![Nk - 1 - Ni] ■■■ h N -i[Nk-N - Ni] 

When the filters are of length L = KN, there are K blocks A{ of size N x N 
each. Similarly to (3.2.4-3.2.5), we see that the basis functions of the first basis 
corresponding to the analysis are 

(pNk+i[n\ = hi[Nk-n\. 

Defining the synthesis matrix as in (3.2.7), we obtain the basis functions of the dual 
basis 

<PNk+i[n] = gi[n-Nk], 

and they satisfy the following biorthogonality relations: 

((p k [n],ifi[n\) = 8[k -I], 

which can be expressed in terms of analysis/synthesis matrices as 

T s T a = I. 

As was done in Section 3.2, we can define single operators for each branch. If the 
operator Hi represents filtering by hi followed by downsampling by N, its matrix 
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representation is 



Hi 



t 



\ 



hi[L-l] ■■■ hi[L-N] hi[L-N-l] 
••• fH[L-l] 



J 



Denning Gi similarly to Hi (except that there is no time reversal), the output of 
the system can then be written as 



'iV-l 



x 



Y, G f H 



i x. 



i=0 



Then, the condition for perfect reconstruction is 

JV-l 



E <% H i = 7 



i=0 



We leave the details and proofs of the above relationships as an exercise (Problem 
3.21), since they are simple extensions of the two-channel case seen in Section 3.2. 



Modulation-Domain Analysis Let us turn our attention to filter banks repre- 
sented in the modulation domain. We write directly the expressions we need in 
the z-domain. One can verify that downsampling a signal x[n] by N followed by 
upsampling by N (that is, replacing x[n], n mod N / by 0) produces a signal y[n] 
with z-transform Y(z) equal to 



N-l 



Y(z) 



L^XlWhz), W N 



e -i**/N t ■ 



i=0 



because of the orthogonality of the roots of unity. Then, the output of the system 
in Figure 3.14 becomes, in a similar fashion to (3.2.14) 



X(z) 



1 

N 



g T (z) H m (z) x m (z) 



where g T (z) = (Gq(z) ... Gjv_i(z)) is the vector containing synthesis filters, 



rN-l„\ \T 



x m{ z ) — (X(z) ... X(W N z)) and the zth line of H m (z) is equal to 



{Hi{z) 



H i (W j 



N-l 

N 



0, . . . , N — 1. Then, similarly to the two-channel 



case, to cancel aliasing, g H m has to have all elements equal to zero, except for 
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the first one. To obtain perfect reconstruction, this only nonzero element has to be 
equal to a scaled pure delay. 

As in the two-channel case, it can be shown that the perfect reconstruction 
condition is equivalent to the system being biorthogonal, as given earlier. The 
proof is left as an exercise for the reader (Problem 3.21). For completeness, let us 
define G m {z) as the matrix with the ith row equal to 

(G (W* N z) dlWifz) ... Gjv-i(W^)). 

Polyphase-Domain Analysis The gist of the polyphase analysis of two-channel 
filter banks downsampled by 2 was to expand signals and filter impulse responses 
into even- and odd- indexed components (together with some adequate phase terms). 
Quite naturally, in the iV-channel case with downsampling by N, there will be N 
polyphase components. We follow the same definitions as in Section 3.2.1 (the 
choice of the phase in the polyphase component is arbitrary, but consistent). 
Thus, the input signal can be decomposed into its polyphase components as 

iV-l 

X(z) = 'Ez-'Xjiz"), 

3=0 

where 

oo 

Xj{z) = J2 x[nN + j]z- n . 

n=—oo 

Define the polyphase vector as 

x p (z) = (X (z) X 1 (z)...X N _ 1 (z)f. 

The polyphase components of the synthesis filter gi are defined similarly, that is 

JV-l 

G t (z) = 5>-^(^), 

3=0 

where 

oo 

Gij(z) = Y, 9i[nN + j]z- n . 

n=—oo 

The polyphase matrix of the synthesis filter bank is given by 

[Gp(z)]ji = Gij(z), 
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where the implicit transposition should be noticed. Up to a phase factor and a 
transpose, the analysis filter bank is decomposed similarly. The filter is written as 

iV-l 

Hi(z) = 5>^(^), (3.4.7) 

j=0 

where 

oo 

Hij{z) = J2 HnN-j] z~ n . (3.4.8) 

n=— oo 

The analysis polyphase matrix is then defined as follows: 

[H p (z)]ij = Hij(z). 

For example, the vector of channel signals, 

v{z) = (yo(z) yi{z)... y N -i(z)) T , 

can be compactly written as 

y(z) = H p (z) x p (z). 

Putting it all together, the output of the analysis/synthesis filter bank in Figure 3.14 
can be written as 

X(z) = (1 z~ l z- 1 ... z- N+1 )-G p (z N )-H p (z N )-x p (z N ). 

Similarly to the two-channel case, we can define the transfer function matrix T p {z) = 
Gp{z)H p {z). Then, the same results hold as in the two-channel case. Here, we just 
state them (the proofs are iV-channel counterparts of the two-channel ones). 

THEOREM 3.16 Multichannel Filter Banks 

(a) Aliasing in a one-dimensional system is cancelled if and only if the trans- 
fer function matrix is pseudo-circulant [311]. 

(b) Given an analysis filter bank downsampled by iV with polyphase matrix 
H p (z), alias-free reconstruction is possible if and only if the normal rank 
of H p {z) is equal to N. 

(c) Given a critically sampled FIR analysis filter bank, perfect reconstruction 
with FIR filters is possible if and only if Aet{H p {z)) is a pure delay. 
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Note that the modulation and polyphase representations are related via the Fourier 
matrix. For example, one can verify that 



*„(*") = jj 



Fx m (z), (3.4.9) 

where F^ = W$ = e -J( 2 ^/ N ) kl m Similar relationships hold between H m (z), G m (z) 
and Hp(z), G p (z), respectively (see Problem 3.22). The important point to note 
is that modulation and polyphase matrices are related by unitary operations (such 
as F and delays as in (3.4.9)). 

Orthogonal Multichannel FIR Filter Banks Let us now consider the particular 
but important case when the filter bank is unitary or orthogonal. This is an ex- 
tension of the discussion in Section 3.2.3 to the ./V-channel case. The idea is to 
implement an orthogonal transform using an iV-channel filter bank, or in other 
words, we want the following set: 

{g [n-NK},...,g N _ 1 [n-NK}}, n£Z 

to be an orthonormal basis for li(Z). Then 

( 9i [n - Nk], 9l [n -Nl}) = 6[i - j] S[l - k}. (3.4.10) 

Since in the orthogonal case analysis and synthesis filters are identical up to a time 
reversal, (3.4.10) holds for hi[Nk — I] as well. By using (2.5.19), (3.4.10) can be 
expressed in z-domain as 



JV-l 

£ 

fc=0 



G i {W k N z)G j {W^ k z- 1 ) = N6[i-j], (3.4.11) 



or 



Gl^z- 1 ) G m (z) = NI, 

where the subscript * stands for conjugation of the coefficients but not of z (this is 
necessary since G m {z) has complex coefficients). Thus, as in the two-channel case, 
having an orthogonal transform is equivalent to having a paraunitary modulation 
matrix. Unlike the two-channel case, however, not all of the filters are obtained 
from a single prototype filter. 

Since modulation and polyphase matrices are related, it is easy to check that 
having a paraunitary modulation matrix is equivalent to having a paraunitary 
polyphase matrix, that is 

Gl^z- 1 ) G m (z) = N I ^ Gliz- 1 ) G p (z) = I. (3.4.12) 
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Finally, in time domain 



GG\ 



S[i-j] I, i,j = 0,1, 



or 



rpT rp 



I. 



The above relations lead to a direct extension of Theorem 3.8, where the particular 
case N = 2 was considered. 

Thus, according to (3.4.12), designing an orthogonal filter bank with N channels 
reduces to finding N x N paraunitary matrices. Just as in the two-channel case, 
where we saw a lattice realization of orthogonal filter banks (see (3.2.60)), N x 
N paraunitary matrices can be parametrized in terms of cascades of elementary 
matrices (2x2 rotations and delays). Such parametrizations have been investigated 
by Vaidyanathan, and we refer to his book [308] for a thorough treatment. An 
overview can be found in Appendix 3. A. 2. As an example, we will see how to 
construct three-channel paraunitary filter banks. 



Example 3.11 

We use the factorization given in Appendix 3. A. 2, (3. A. 8). Thus, we can express the 3x3 
polyphase matrix as 



where 



U 



GJz) 



1 



U 



ii 



cos «oo 
sin qqo 




n 




- sin ao2 

COS Q02 

1 



Ui 



COS Qoi 

1 
sinaoi 





- sin Qio \ / 

COS OiO ' '■ 







A 



1 











cos an 


— sin an 





sin an 


cos an 



The degrees of freedom are given by the angles a.ij. To obtain the three analysis filters, we 
upsample the polyphase matrix, and thus 

[G Q (z) Gy{z) G 2 (z)] = [1 z- 1 z~ 2 } G P (z 3 ). 

To design actual filters, one could minimize an objective function as the one given in [306], 
where the sum of all the stopbands was minimized. 
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It is worthwhile mentioning that ./V-channel orthogonal filter banks with more 
than two channels have greater design freedom. It is possible to obtain orthogo- 
nal linear phase FIR solutions [275, 321], a solution which was impossible for two 
channels (see Appendix 3. A. 2). 

3.4.3 Modulated Filter Banks 

We will now examine a particular class of N channel filter banks -- modulated 
filter banks. The name stems from the fact that all the filters in the analysis bank 
are obtained by modulating a single prototype filter. If we impose orthogonality 
as well, the synthesis filters will obviously be modulated as well. The first class 
we consider imitates the short-time Fourier transform (STFT), but in the discrete- 
time domain. The second one — cosine modulated filter banks, is an interesting 
counterpart to the STFT, and when the length of the filters is restricted to 2N, it 
is an example of a modulated LOT. 

Short-Time Fourier Transform in the Discrete-Time Domain The short-time 
Fourier or Gabor transform [204, 226] is a very popular tool for nonstationary 
signal analysis (see Section 2.6.3). It has an immediate filter bank interpretation. 
Assume a window function /z pr [n] with a corresponding z-transform H pT (z). This 
window function is a prototype lowpass filter with a bandwidth of 2n/N, which is 
then modulated evenly over the frequency spectrum using consecutive powers of 
the TVth root of unity 

Hi(z) = H pi (W l N z), i = 0,...,N-l, W N = e~ j2 ^ N , (3.4.13) 

or 

hi[n] = VVN. (3.4.14) 

That is, if H vr {e 3W ) is a lowpass filter centered around to = 0, then Hi{e 3W ) is a 
bandpass filter centered around to = {i2-Ji)/N . Note that the prototype window is 
usually real, but the bandpass filters are complex. 

In the short-time Fourier transform, the window is advanced by M samples 
at a time, which corresponds to a downsampling by M of the corresponding filter 
bank. This filter bank interpretation of the short-time Fourier transform analysis 
is depicted in Figure 3.15. The short-time Fourier transform synthesis is achieved 
similarly with a modulated synthesis filter bank. Usually, M is chosen smaller than 
N (for example, N/2), and then, it is obviously an oversampled scheme or a noncrit- 
ically sampled filter bank. Let us now consider what happens if we critically sample 
such a filter bank, that is, downsample by N. Compute a critically sampled discrete 
short-time Fourier (or Gabor) transform, where the window function is given by 
the prototype filter. It is easy to verify the following negative result [315] (which is 
a discrete-time equivalent of the Balian-Low theorem, given in Section 5.3.3): 
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D>D 



Figure 3.15 A noncritically sampled filter bank; it has N branches followed 
by sampling by M (N > M). When the filters are modulated versions (by 
the iVth root of unity), then this implements a discrete-time version of the 
short-time Fourier transform. 



Theorem 3.17 



There are no finite-support bases with filters as in (3.4.13) (except trivial ones 
with only N nonzero coefficients). 



Proof 



The proof consists in analyzing the polyphase matrix H p (z). Write the prototype filter 
Hpr(z) in terms of its polyphase components (see (3.4.7-3.4.8)) 

JV-l 

H pl (z) = ^z'H pri (z lf ), 

3=0 

where H pr ,(z) is the jth polyphase component of H pr (z). 
Obviously, following (3.4.7) and (3.4.13), 

H t (z) = Y. W n *' H prj (z N ). 
Therefore, the polyphase matrix H p (z) has entries 

[H P (z)] tj = W% H prj (z). 
Then, H p (z) can be factored as 

/ H pro (z) 



H p {z) = F 



H pri (z) 



\ 



H prN _ 1 (z)J 



(3.4.15) 
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where F k i == W^ — e ~ jl - 2w/N)kl . For FIR perfect reconstruction, the determinant of H p (z) 
has to be a delay (by Theorem 3.16). Now, 

AT-l 

det(H p (z)) = cl[H prj (z), 

3 = 

where c is a complex number equal to det(F). Therefore, for perfect FIR reconstruction, 
Hprj (z) has to be of the form a t ■ z~ m , that is, the prototype filter has exactly N nonzero 
coefficients. For an orthogonal solution, the on's have to be unit-norm constants. 

What happens if we relax the FIR requirement? For example, one can choose 
the following prototype: 

N-l 

H pv (z) = J2 P ^ zN ) **> ( 3 - 4 - 16 ) 

i=0 

where Pi{z) are allpass filters. The factorization (3.4.15) still holds, with H pTi {z) = 
Pi(z), and since Pi(z~ 1 ) ■ Pi(z) = 1, H p (z) is paraunitary. While this gives an 
orthogonal modulated filter bank, it is IIR (either analysis or synthesis will be 
noncausal), and the quality of the filter in (3.4.16) can be poor. 



Cosine Modulated Filter Banks The problems linked to complex modulated fil- 
ter banks can be solved by using appropriate cosine modulation. Such cosine- 
modulated filter banks are very important in practice, for example in audio com- 
pression (see Section 7.2.2). Since they are often of length L = 2N (where TV is the 
downsampling rate), they are sometimes referred to as modulated LOT's, or MLT's. 
A popular version was proposed in [229] and thus called the Princen-Bradley filter 
bank. We will study one class of cosine modulated filter banks in some depth, and 
refer to [188, 308] for a more general and detailed treatment. The cosine modulated 
filter banks we consider here are a particular case of pseudoquadrature mirror filter 
banks (PQMF) when the filter length is restricted to twice the number of channels 
L = 2N. Pseudo QMF filters have been proposed as an extension to N channels 
of the classical two-channel QMF filters. Pseudo QMF analysis/synthesis systems 
achieve in general only cancellation of the main aliasing term (aliasing from neigh- 
boring channels). However, when the filter length is restricted to L = 2N, they 
can achieve perfect reconstruction. Due to the modulated structure and just as in 
the STFT case, there are fast computational algorithms, making such filter banks 
attractive for implementations. 

A family of PQMF filter banks that achieves cancellation of the main aliasing 
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h k [n] = -7= VN cos ( oAr ( n ~ ( — S~ ) ) + <f>k ) , (3.4.17) 



h k [n] = -f=h pr [n] cos ( - Ar (2n - JV + 1)tt , (3.4.18) 



term is of the form [188, 321] n 

1 , M f ir(2k + l) ( (L-\ 

m h pr [n] cos ^-^ • 

for the analysis filters (/& pr [n] is the impulse response of the window). The modu- 
lating frequencies of the cosines are at tt/2N, 3tt/2N, . . . , (2N — l)ir/2N, and the 
prototype window is a lowpass filter with support [—tt/2N,tt/2N]. Then, the kth 
filter is a bandpass filter with support from kn/N to {k + l)n/N (and a mirror 
image from —kn/N to —{k + l)n/N), thus covering the range from to tt evenly. 
Note that for k = and N — 1, the two lobes merge into a single lowpass and 
highpass filter respectively. In the general case, the main aliasing term is canceled 
for the following possible value of the phase: 

■K , IT 
<t>k = — \- k — . 
Vk 4 2 

For this value of phase, and in the special case L = 2N, exact reconstruction is 

achieved. This yields filters of the form 

1 [2k + 1 

m h pr [n] cos ^^^ 

for k = 0, . . . , N — 1, n = 0, ..., 2iV — 1. Since the filter length is 2N, we have 
an LOT, and we can use the formalism in (3.4.4). It can be shown that, due to 
the particular structure of the filters, if /i pr [ n ] = 1, n = 0, ...,27V — 1, (3.4.5- 
3.4.6) hold. The idea of the proof is the following (we assume TV to be even): 
Being of length 27V, each filter has a left and a right tail of length TV. It can be 
verified that with the above choice of phase, all the filters have symmetric left tails 
(h k [N/2 -1-1} = h k [N/2 + 1], for I = 0, . . . , 7V/2 - 1) and antisymmetric right tails 
(h k [3N/2 -1-1} = h k [3N/2 + 1}, for I = 0, . . . , N/2 - 1). Then, orthogonality of 
the tails (see (3.4.6)) follows because the product of the left and right tail is an odd 
function, and therefore, sums to zero. Additionally, each filter is orthogonal to its 
modulated versions and has norm 1, and thus, we have an orthonormal LOT. The 
details are left as an exercise (see Problem 3.24). 

Suppose now that we use a symmetric window h pr [n] . We want to find conditions 
under which (3.4.5-3.4.6) still hold. Call Bi the blocks in (3.4.5-3.4.6) when no 
windowing is used, or /i pr [n] = 1, n = 0, . . . ,2N — 1, and Aj the blocks, with a 
general symmetric window h pr [n]. Then, we can express Aq in terms of -Bo as 

/ ho[2N-l] ••• h [N] 

A = ; ; | (3.1.19) 

W_i[2JV-l] ••• h N -i[N]. 



11 The derivation of this type of filter bank is somewhat technical and thus less explicit at times 
than other filter banks seen so far. 
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'hpr[2N-l] \ 

B {) j •.. (3.4.20) 

hpr[N] ) 

B {) \ •.. (3.4.21) 

hpr[N-l]J 

S v ' 

W 

since h pT is symmetric, that is /i pr [w] = h pT [2N — 1 — n], and W denotes the window 
matrix. Using the antidiagonal matrix J, 

i 

1 

it is easy to verify that A\ is related to B±, in a similar fashion, up to a reversal of 
the entries of the window function, or 

A 1 = B t JWJ. (3.4.22) 

Note also that due to the particular structure of the cosines involved, the following 
are true as well: 

B^Bo = \{I-J), BjB 1 = \{I + J)- (3-4.23) 

The proof of the above fact is left as an exercise to the reader (see Problem 3.24). 
Therefore, take (3.4.5) and substitute the expressions for Ao and A\ given in 
(3.4.19) and (3.4.22) 

A^Ao + AfAi = WB^B W + JWJBjB 1 JWJ = I. 

Using now (3.4.23), this becomes 

-W 2 + -JW 2 J = I, 
2 2 

where we used the fact that J = I. In other words, for perfect reconstruction, the 
following has to hold: 

h 2 pr {i] + h 2 pr [N - 1 - i] = 2, (3.4.24) 

that is, a power complementary property. Using the expressions for Aq and A\, 
one can easily prove that (3.4.6) holds as well. 

Condition (3.4.24) also regulates the shape of the window. For example, if 
instead of length 2N, one uses shorter window of length 2N — 2M, then the outer 
M coefficients of each "tail" (the symmetric nonconstant half of the window) are 
set to zero, and the inner M ones are set to v2 according to (3.4.24). 
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Table 3.4 Values of a power complementary 
window used for generating cosine mod- 
ulated filter banks (the window satisfies 
(3.4.24)). It is symmetric (h pT [16-k-l] = 
hp T [k]). 



hpr[0] 


0.f25533 


V[4] 


1.111680 


hpr[l] 


0.334662 


tlp r [0\ 


1.280927 


rip r [zj 


0.599355 


hpr[6] 


1.374046 


flpr [o\ 


0.874167 


hpr[7] 


1408631 





(a) 



(b) 



Figure 3.16 An example of a cosine modulated filter bank with N = 8. (a) 
Impulse responses for the first four filters, (b) The magnitude responses of all 
the filters are given. The symmetric prototype window is of length 16 with the 
first 8 coefficients given in Table 3.4. 



Example 3.12 



Consider the case N — 8. The center frequency of the modulated filter hk[n] is (2fc+l)27r/32, 
and since this is a cosine modulation and the filters are real, there is a mirror lobe at 
(32 — 2k — l)27r/32. For the filters ho[n] and ^[n], these two lobes overlap to form a single 
lowpass and highpass, respectively, while hi[n], . . . ,hg[n] are bandpass filters. A possible 
symmetric window of length 16 and satisfying (3.4.24) is given in Table 3.4, while the impulse 
responses of the first four filters as well as the magnitude responses of all the modulated 
filters are given in Figure 3.16. 



Note that cosine modulated filter banks which are orthogonal have been recently 
generalized to lengths L = KN where K can be larger than 2. For more details, 
refer to [159, 188, 235, 308]. 
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3.5 Pyramids and Overcomplete Expansions 

In this section, we will consider expansions that are overcomplete, that is, the set 
of functions used in the expansion is larger than actually needed. In other words, 
even if the functions play the role of a set of "basis functions" , they are actually 
linearly dependent. Of course, we are again interested in structured overcomplete 
expansions and will consider the ones implementable with filter banks. In filter 
bank terminology, overcomplete means we have a noncritically sampled filter bank, 
as the one given in Figure 3.15. 

In compression applications, such redundant representations tend to be avoided, 
even if an early example of a multiresolution overcomplete decomposition (the pyra- 
mid scheme to be discussed below) has been used for compression. Such schemes 
are also often called hierarchical transforms in the compression literature. 

In some other applications, overcomplete expansions might be more appropriate 
than bases. One of the advantages of such expansions is that, due to oversampling, 
the constraints on the filters used are relaxed. This can result in filters of a superior 
quality than those in critically sampled systems. Another advantage is that time 
variance can be reduced, or in the extreme case of no downsampling, avoided. One 
such example is the oversampled discrete-time wavelet series which is also explained 
in what follows. 

3.5.1 Oversampled Filter Banks 

The simplest way to obtain a noncritically sampled filter bank is not to sample at 
all, producing an overcomplete expansion. Thus, let us consider a two-channel filter 
bank with no downsampling. In the scheme given in Figure 3.15 this means that 
N = 2 and M = 1. Then, the output is (see also Example 5.2) 

X(z) = [G (z) H (z) + G 1 (z) H±{z)) X(z), (3.5.1) 

and perfect reconstruction is easily achievable. For example, in the FIR case if 
Hq(z) and H\(z) have no zeros in common (that is, the polynomials in z~ l are 
coprime), then one can use Euclid's algorithm [32] to find G${z) and G\(z) such 
that 

G (z) H (z) + G^z) H^z) = 1 

is satisfied leading to X{z) = X{z) in (3.5.1). Note how coprimeness of Hq{z) and 
H\(z), used in Euclid's algorithm, is also a very natural requirement in terms of 
signal processing. A common zero would prohibit FIR reconstruction, or even IIR 
reconstruction (if the common zero is on the unit circle). Another case appears 
when we have two filters Gq(z) and G\{z) which have unit norm and satisfy 

G Q {z)G Q {z- 1 ) + G l {z)G l {z- 1 ) = 2, (3.5.2) 
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since then with Hq(z) = G$(z~ l ) and H\(z) = G\(z~ l ) one obtains 

X(z) = [G (z)G (z- 1 ) + G 1 (z)G l (z- 1 )]X(z) = 2X(z). 

Writing this in time domain (see Example 5.2), we realize that the set {gi[n — k]}, 
i = 0,1, and k £ Z, forms a tight frame for h(Z) with a redundancy factor R = 2. 
The fact that {gi[n — k]} form a tight frame simply means that they can uniquely 
represent any sequence from li(£) (see also Section 5.3). However, the basis vectors 
are not linearly independent and thus they do not form an orthonormal basis. The 
redundancy factor indicates the oversampling rate; we can indeed check that it is 
two in this case, that is, there are twice as many basis functions than actually needed 
to represent sequences from l2{Z). This is easily seen if we remember that until 
now we needed only the even shifts of gi[n] as basis functions, while now we use the 
odd shifts as well. Also, the expansion formula in a tight frame is similar to that in 
the orthogonal case, except for the redundancy (which means the functions in the 
expansion are not linearly independent). There is an energy conservation relation, 
or Parseval's formula, which says that the energy of the expansion coefficients equals 
R times the energy of the original. In our case, calling yi[n] the output of the filter 
hi[n], we can verify (Problem 3.26) that 

INI 2 = 2(||y || 2 + ||yi|| 2 ). (3.5.3) 

To design such a tight frame for li{£) based on filter banks, that is, to find solutions 
to (3.5.2), one can find a unit norm 12 filter Gq(z) which satisfies 

< |G (e^)| 2 < 2, 

and then take the spectral factorization of the difference 2 — Gq{z)Gq{z~ 1 ) = 
G\{z)G\{z~ l ) to find G\(z). Alternatively, note that (3.5.2) means the 2x1 vector 
( Gq(z) G\{ z ) ) is lossless, and one can use a lattice structure for its factorization, 
just as in the 2x2 lossless case [308]. On the unit circle, (3.5.2) becomes 

\G (en\ 2 + \Gi(en\ 2 = 2, 

that is, Gq(z) and G\(z) are power complementary. Note that (3.5.2) is less restric- 
tive than the usual orthogonal solutions we have seen in Section 3.2.3. For example, 
odd-length filters are possible. 

Of course, one can iterate such nondownsampled two-channel filter banks, and 
get more general solutions. In particular, by adding two-channel nondownsampled 
filter banks with filters {Hq(z 2 ), H\(z 2 )} to the lowpass analysis channel and iter- 
ating (raising z to the appropriate power) one can devise a discrete-time wavelet 



"Note that the unit norm requirement is not necessary for constructing a tight frame. 
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Figure 3.17 Pyramid scheme involving a coarse lowpass approximation and 
a difference between the coarse approximation and the original. We show the 
case where an orthogonal filter is used and therefore, the coarse version (after 
interpolation) is a projection onto V\, while the difference is a projection onto 
W\. This indicates the multiresolution behavior of the pyramid. 



series. This is a very redundant expansion, since there is no downsampling. How- 
ever, unlike the critically sampled wavelet series, this expansion is shift-invariant 
and is useful in applications where shift invariance is a requirement (for example, 
object recognition). 

More general cases of noncritically sampled filter banks, that is, iV-channel filter 
banks with downsampling by M where M < N, have not been much studied (except 
for the Fourier case discussed below). While some design methods are possible (for 
example, embedding into larger lossless systems), there are still open questions. 

3.5.2 Pyramid Scheme 

In computer vision and image coding, a successive approximation or multiresolution 
technique called an image pyramid is frequently used. This scheme was introduced 
by Burt and Adelson [41] and was recognized by the wavelet community to have a 
strong connection to multiresolution analysis as well as orthonormal bases of wave- 
lets. It consists of deriving a low-resolution version of the original, then predicting 
the original based on the coarse version, and finally taking the difference between the 
original and the prediction (see Figure 3.17). At the reconstruction, the prediction 
is added back to the difference, guaranteeing perfect reconstruction. A shortcoming 
of this scheme is the oversampling, since we end up with a low-resolution version 
and a full-resolution difference signal (at the initial rate). Obviously, the scheme 
can be iterated, decomposing the coarse version repeatedly, to obtain a coarse ver- 
sion at level J plus J detailed versions. From the above description, it is obvious 
that the scheme is inherently multiresolution. Consider, for example, the coarse 
and detailed versions at the first level (one stage). The coarse version is now at 
twice the scale (downsampling has contracted it by 2) and half the resolution (in- 
formation loss has occurred), while the detailed version is also of half resolution but 
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of the same scale as the original. Also, a successive approximation flavor is easily 
seen: One could start with the coarse version at level J, and by adding difference 
signals, obtain versions at levels J — 1, ... ,1,0, (that is, the original). 

An advantage of the pyramid scheme in image coding is that nonlinear inter- 
polation and decimation operators can be used. A disadvantage, however, as we 
have already mentioned, is that the scheme is oversampled, although the overhead 
in number of samples decreases as the dimensionality increases. In n dimensions, 
oversampling s as a function of the number of levels L in the pyramid is given by 

i=0 v 7 

which is an overhead of 50-100% in one dimension. It goes down to 25-33% in two 
dimensions, and further down to 12.5-14% in three dimensions. However, we will 
show below [240, 319] that if the system is linear and the lowpass filter is orthogonal 
to its even translates, then one can actually downsample the difference signal after 
filtering it. In that case, the pyramid reduces exactly to a critically downsampled 
orthogonal subband coding scheme. 

First, the prediction of the original, based on the coarse version, is simply the 
projection onto the space spanned by {ho[2k — n],k £ Z}. That is, calling the 
prediction x 

x = H Hq x. 

The difference signal is thus 

d = (J - Hi H Q ) x. 

But, because it is a perfect reconstruction system 

I — H Ho = H ± Hi, 

that is, d is the projection onto the space spanned by {h\\2k— n], k £ Z}. Therefore, 
we can filter and downsample d by 2, since 

H\H 1 H\ = Hi. 

In that case, the redundancy of d is removed (d is now critically sampled) and the 
pyramid is equivalent to an orthogonal subband coding system. 

The signal d can be reconstructed by upsampling by 2 and filtering with /ii[n]. 
Then we have 

Hj(HiHjHi) x = H^H ± x = d 

and this, added to x = H Hqx, is indeed equal to x. In the notation of the 
multiresolution scheme the prediction x is the projection onto the space V\ and d 
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is the projection onto W\. This is indicated in Figure 3.17. We have thus shown 
that pyramidal schemes can be critically sampled as well, that is, in Figure 3.17 the 
difference signal can be followed by a filter h\\n\ and a downsampler by 2 without 
any loss of information. 

Note that we assumed an orthogonal filter and no quantization of the coarse 
version. The benefit of the oversampled pyramid comes from the fact that arbitrary 
filters (including nonlinear ones) can be used, and that quantization of the coarse 
version does not influence perfect reconstruction (see Section 7.3.2). 

This scheme is very popular in computer vision, not so much because perfect 
reconstruction is desired but because it is a computationally efficient way to obtain 
multiple resolution of an image. As a lowpass filter, an approximation to a Gaus- 
sian, bell-shaped filter is often used and because the difference signal resembles the 
original filtered by the Laplace operator, such a scheme is usually called a Laplacian 
pyramid. 

3.5.3 Overlap-Save/Add Convolution and Filter Bank Implementations 

Filter banks can be used to implement algorithms for the computation of convolu- 
tions (see also Section 6.5.1). Two classic examples are block processing schemes — 
the overlap-save and overlap-add algorithms for computing a running convolution 
[211]. Essentially, a block of input is processed at a time (typically with frequency- 
domain circular convolution) and the output is merged so as to achieve true linear 
running convolution. Since the processing advances by steps (which corresponds 
to downsampling the input by the step size), these two schemes are multirate in 
nature and have an immediate filter bank interpretation [317]. 

Overlap-Add Scheme This scheme performs the following task: Assuming a 
filter of length L, the overlap-add algorithm takes a block of input samples of 
length M = N - L + 1, and feeds it into a size-iV FFT (N > L). This results in 
a linear convolution of the signal with the filter. Since the size of the FFT is N, 
there will be L — 1 samples overlapping with adjacent blocks of size M, which are 
then added together (thus the name overlap-add). One can see that such a scheme 
can be implemented with an iV-channel analysis filter bank downsampled by M, 
followed by multiplication (convolution in Fourier domain), upsampling by M and 
an TV-channel synthesis filter bank, as shown in Figure 3.18. 

For the details on computational complexity of the filter bank, refer to Sec- 
tions 6.2.3 and 6.5.1. Also, note, that the filters used are based on the short-time 
Fourier transform. 

Overlap-Save Scheme Given a length- L filter, the overlap-save algorithm per- 
forms the following: It takes N input samples, computes a circular convolution of 
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Figure 3.18 iV-channel analysis/synthesis filter bank with downsampling by 
M and filtering of the channel signals. The downsampling by M is equiva- 
lent to moving the input by M samples between successive computations of 
the output. With filters based on the Fourier transform, and filtering of the 
channels chosen to perform frequency-domain convolution, such a filter bank 
implements overlap-save/add running convolution. 



which N — L + 1 samples are valid linear convolution outputs and L — 1 samples 
are wrap-around effects. These last L — 1 samples are discarded. The N — L + 1 
valid ones are kept and the algorithm moves up by N — L + 1 samples. The filter 
bank implementation is similar to the overlap-add scheme, except that analysis and 
synthesis filters are interchanged [317]. 

Generalizations The above two schemes are examples from a general class of 
oversampled filter banks which compute running convolution. For example, the 
pointwise multiplication in the above schemes can be replaced by a true convolu- 
tion and will result in a longer overall convolution if adequately chosen. Another 
possibility is to use analysis and synthesis filters based on fast convolution algo- 
rithms other than Fourier ones. For more details, see [276, 317] and Section 6.5.1. 



3.6 Multidimensional Filter Banks 

It seems natural to ask if the results we have seen so far on expansion of one- 
dimensional discrete-time signals can be generalized to multiple dimensions. This is 
both of theoretical interest as well as relevant in practice, since popular applications 
such as image compression often rely on signal decompositions. One easy solution 
to the multidimensional problem is to apply all known one- dimensional techniques 
separately along one dimension at a time. Although a very simple solution, it suffers 
from some drawbacks: First, only separable (for example, two-dimensional) filters 
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are obtained in this way, leading to fairly constrained designs (nonseparable niters of 
size N\ x N2 would offer N± ■ N2 free design variables versus N± + N2 in the separable 
case). Then, only rectangular divisions of the spectrum are possible, though one 
might need divisions that would better capture the signal's energy concentration 
(for example, close to circular). 

Choosing nonseparable solutions, while solving some of these problems, comes 
at a price: the design is more difficult, and the complexity is substantially higher. 

The first step toward using multidimensional techniques on multidimensional 
signals is to use the same kind of sampling as before (that is, in the case of an im- 
age, sample first along the horizontal and then along the vertical dimension), but use 
nonseparable filters. A second step consists in using nonseparable sampling as well 
as nonseparable filters. This calls for the development of a new theory that starts 
by pointing out the major difference between one- and multidimensional cases - 
sampling. Sampling in multiple dimensions is represented by lattices. An excellent 
presentation of lattice sampling can be found in the tutorial by Dubois [86] (Ap- 
pendix 3.B gives a brief overview). Filter banks using nonseparable downsampling 
were studied in [11, 314]. The generalization of one-dimensional analysis methods 
to multidimensional filter banks using lattice downsampling was done in [155, 325]. 
The topic has been quite active recently (see [19, 47, 48, 160, 257, 264, 288]). 

In this section, we will give an overview of the field of multidimensional filter 
banks. We will concentrate mostly on two cases: the separable case with down- 
sampling by 2 in two dimensions, and the quincunx case, that is, the simplest 
multidimensional nonseparable case with overall sampling density of 2. Both of 
these cases are of considerable practical interest, since these are the ones mostly 
used in image processing applications. 

3.6.1 Analysis of Multidimensional Filter Banks 

In Appendix 3.B, a brief account of multidimensional sampling is given. Using the 
expressions given for sampling rate changes, analysis of multidimensional systems 
can be performed in a similar fashion to their one-dimensional counterparts. Let 
us start with the simplest case, where both the filters and the sampling rate change 
are separable. 

Example 3.13 Separable Case with Sampling by 2 in Two Dimensions 

If one uses the scheme as in Figure 3.19 then all one-dimensional results are trivially extended 
to two dimensions. However, all limitations appearing in one dimension, will appear in 
two dimensions as well. For example, we know that there are no real two-channel perfect 
reconstruction filter banks, being orthogonal and linear phase at the same time. This implies 
that the same will hold in two dimensions if separable filters are used. 

Alternatively, one could still sample separately (see Figure 3.20(a)) and yet use 



186 



CHAPTER 3 



1 H r-fz"^) .HL 

1— J «i [ -(2|)— »LH 
— J Wp | -(2|)— LL 



































uu 








D 












DD 

LL LH HL HH 






u u 



(a) 



(b) 



Figure 3.19 Separable filter bank in two dimensions, with separable downsam- 
pling by 2. (a) Cascade of horizontal and vertical decompositions, (b) Division 
of the frequency spectrum. 





(a) 



(b) 



Figure 3.20 Two often used lattices, 
dimensions, (b) Quincunx sampling. 



(a) Separable sampling by 2 in two 



nonseparable niters. In other words, one could have a direct four-channel implemen- 
tation of Figure 3.19 where the four filters could be Ho, Hi, H2, H3. While before, 
Hi(zi, 22) = Hi x (zi)Hi 2 (22) where Hi(z) is a one-dimensional filter, Hi(zi, 22) is now a true 
two-dimensional filter. This solution, while more general, is more complex to design and 
implement. It is possible to obtain an orthogonal linear phase FIR solution [155, 156], which 
cannot be achieved using separable filters (see Example 3.15 below). 

Similarly to the one- dimensional case, one can define polyphase decompositions 
of signals and filters. Recall that in one dimension, the polyphase decomposition of 
the signal with respect to N was simply the subsignals which have the same indexes 
modulo N. The generalization in multiple dimensions are cosets with respect to 
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a downsampling lattice. There is no natural ordering such as in one dimension 
but as long as all N cosets are included, the decomposition is valid. In separable 
downsampling by 2 in two dimensions, we can take as coset representatives the 
points {(0, 0), (1, 0), (0, 1), (1, 1)}. Then the signal X(z±, z 2 ) can be written as 

X( Zl ,z 2 ) = X 00 (zj,z^ + z^ 1 X l0 (zj,z^ + z^ 1 X 0l (zj,z^) + 



[ z^ 1 X 11 (z 2 1 ,z 2 2 ) 



(3.6.1) 



where 



Xij(zi,z 2 ) 






2^2^ z i 



l z 2 n x[2m + i,2n + j}. 



Thus, the polyphase component with indexes i,j corresponds to a square lattice 
downsampled by 2, and with the origin shifted to (i,j). The recombination of 
X{z\,z 2 ) from its polyphase components as given in (3.6.1) corresponds to an in- 
verse polyphase transform and its dual is therefore the forward polyphase transform. 
The polyphase decomposition of analysis and synthesis filter banks follow similarly. 

The synthesis filters are decomposed just as the signal (see (3.6.1)), while the 
analysis filters have reverse phase. We shall not dwell longer on these decompo- 
sitions since they follow easily from their one-dimensional counterparts but tend 
to involve a bit of algebra. The result, as to be expected, is that the output of 
an analysis/synthesis filter bank can be written in terms of the input polyphase 
components times the product of the polyphase matrices. 

The output of the system could also be written in terms of modulated versions 
of the signal and filters. For example, downsampling by 2 in two dimensions, and 
then upsampling by 2 again (zeroing out all samples except the ones where both 
indexes are even) can be written in z-domain as 



1 



(X( Zl ,z 2 ) + X{-z u z 2 ) + X(z u -z 2 ) + X(-zi, -z 2 )). 



Therefore, it is easy to verify that the output of a four-channel filter bank with 
separable downsampling by 2 has an output that can be written as 

1 



Y( Zl ,z 2 ) 



g (zi,z 2 ) H m (zi,z 2 ) x m (z 1 ,z 2 ), 



where 



,Ti 



9^(zi,z 2 ) = 

(Go(zi,Z2) G 1 (zi,Z2) G 2 (z 1 ,z 2 ) G 3 (zi,z 2 )), 
H m (zi,z 2 ) = 

I H (zi,z 2 ) H (-zi,Z2) H (z 1 ,-z 2 ) H (-zi, 

Hi(zi,Z2) H 1 (-Zi,Z 2 ) Hi(zi,-Z2) Hi(-Z!, 

H 2 {zx,z 2 ) H 2 (-zi,z 2 ) E 2 {z x ,-z 2 ) H 2 (-zi, 

V H 3 (z 1 ,z 2 ) H 3 (-zi,z 2 ) H 3 (z 1 ,-z 2 ) H 3 (-zi, 



-Z2) 
-Z2) 
-Z2) 



\ 
I 



(3.6.2) 



(3.6.3) 
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x m (zi,z 2 ) = 

(X(z 1 ,z 2 ) X(-z 1 ,z 2 ) X(zi,-z 2 ) X(-zi,-z 2 )). 

Let us now consider an example involving nonseparable downsampling. We 
examine quincunx sampling (see Figure 3.20(b)) because it is the simplest mul- 
tidimensional nonseparable lattice. Moreover, it samples by 2, that is, it is the 
counterpart of the one-dimensional two-channel case we discussed in Section 3.2. 

Example 3.14 Quincunx Case 

It is easy to verify that, given X (21,22), quincunx downsampling followed by quincunx 
upsampling (that is, replacing the locations with empty circles in Figure 3.20(b) by 0) 
results in a z-transform equal to l/2(X(zi,z 2 ) + X(— z\, — 22))- From this, it follows that 
a two-channel analysis/synthesis filter bank using quincunx sampling has an input /output 
relationship given by 

v i \ 1 ( n ( \ n ( \\ ( H Q {z 1 ,z 2 ) H {-z 1 ,-z 2 ) \ 

Y( Zl ,z 2 ) = -{G {z 1 ,z 2 )G 1 (z 1 ,z 2 )) (^ Hi{zuZ2) Hl{ _ Zl> _ Z2) ) 

X(Z!,Z 2 ) 

X(-z ly -z 2 ) 

Similarly to the one-dimensional case, it can be verified that the orthogonality of the system 
is achieved when the lowpass filter satisfies 

H (zi,z 2 )H (zi ,z 2 ) + Ho(-z 1 ,-z 2 )H (-z^ ,-z 2 ) = 2, (3.6.4) 

that is, the lowpass filter is orthogonal to its shifts on the quincunx lattice. Then, a possible 
highpass filter is given by 

H 1 (z 1 ,z 2 ) = -Z]~ H a (-Zi ,-z 2 ). (3.6.5) 

The synthesis filters are the same (within shift reversal, or Gi(zi,z 2 ) — Hi(z^ 1 jZ^" 1 )). In 
polyphase domain, define the two polyphase components of the filters as 
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H i0 {zi,z 2 ) — 2_, hi[ni +n 2 ,ni — n 2 ]z 1 ni z 

(»l,»2)« ! 

Hii(z!,z 2 ) — 22 hi[ni +n 2 + l,ni — n 2 ]z^ ni z 

(»1,»2)« ! 



2 



with 



H i (z 1 ,z 2 ) — H i0 (z 1 z 2 ,z 1 z 2 )-Vz 1 H il (z 1 z 2 ,z 1 z 2 



The results on alias cancellation and perfect reconstruction are very similar to 
their one-dimensional counterparts. For example, perfect reconstruction with FIR 
filters is achieved if and only if the determinant of the analysis polyphase matrix is 
a monomial, that is, 

"pl^lj • • • j Zn) = C ■ Z^ ■ ■ ■ ■ Z n 
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Since the results are straightforward extensions of one-dimensional results, we rather 
discuss two cases of interest in more detail, while the reader is referred to [48, 163, 
308, 325] for a more in-depth discussion of multidimensional results. 

3.6.2 Synthesis of Multidimensional Filter Banks 

The design of nonseparable systems is more challenging than the one- dimensional 
cases. Designs based on cascade structures as well as one- to multidimensional 
transformations are discussed next. 

Cascade Structures When synthesizing filter banks, one of the most obvious 
approaches is to try to find cascade structures that would generate filters of the 
desired form. This is because cascade structures (a) usually have low complexity, 
(b) higher-order filters are easily derived from lower-order ones, and (c) the coef- 
ficients can be quantized without affecting the desired form. However, unlike in 
one dimension, there are very few results on completeness of cascade structures in 
multiple dimensions. 

While cascades of orthogonal building blocks (that is, orthogonal matrices and 
diagonal delay matrices) obviously will yield orthogonal filter banks, producing 
linear phase solutions needs more care. For example, one can make use of the 
linear phase testing condition given in [155] or [163] to obtain possible cascades. 
As one of the possible approaches consider the generalization of the linear phase 
cascade structure proposed in [155, 156, 321]. Suppose that a linear phase system 
has been already designed and a higher-order one is needed. Choosing 

H" p (z) = R D(z) H' p (z), 

where D(z) = z~ JD(z~ 1 )J and R is persymmetric (R = JRJ), another 
linear phase system is obtained, where the filters have the same symmetry as in 
H' . Although this cascade is by no means complete, it can produce very useful 
filters. Let us also point out that when building cascades in the polyphase domain, 
one must bear in mind that using different sampling matrices for the same lattice 
will greatly affect the geometry of the filters obtained. 

Example 3.15 Separable Case 

Let us first present a cascade structure, that will generate four linear phase/ 
orthogonal filters of the same size, where two of them are symmetric and the other two 
antisymmetric [156] 

i 

H p (zi,z 2 ) — |~[ Ri D(z!,z 2 ) So- 

i=K-l 
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In the above, D is the matrix of delays containing (1 z^ 1 z^ 1 (z\Z2)~ 1 ) along the 
diagonal, and Ri and So are scalar persymmetric matrices, that is, they satisfy 

R t = JRiJ. (3.6.6) 

Equation (3.6.6) along with the requirement that the Ri be unitary, allows one to design fil- 
ters being both linear phase and orthogonal. Recall that in the two-channel one-dimensional 
case these two requirements are mutually exclusive, thus one cannot design separable filters 
satisfying both properties in this four-channel two-dimensional case. This shows how using 
a true multidimensional solution offers greater freedom in design. To obtain both linear 
phase and orthogonality, one has to make sure that, on top of being persymmetric, matrices 
Ri have to be unitary as well. These two requirements lead to 

where R 2 i, Rzi+i are 2x2 rotation matrices, and 

' R \ ( I I \ ( I 



So l RiJ V i -i ) V J 

This cascade is a two-dimensional counterpart of the one given in [275, 321], and will be 
shown to be useful in producing regular wavelets being both linear phase and orthonormal 
[165] (see Chapter 4). 

Example 3.16 Quincunx Cascades 

Let us first present a cascade structure that can generate filters being either orthogonal or 
linear phase. It is obtained by the following: 
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For the filters to be orthogonal the matrices Rj i have to be unitary. To be linear, phase 
matrices have to be symmetric. In the latter case the filters obtained will have opposite 
symmetry. Consider, for example, the orthogonal case. The smallest lowpass filter obtained 
from the above cascade would be 



o[n 1 ,n 2 ] — -a 2 -a a 2 -a a 1 , (3.6.7) 
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where a; are free variables, and ho[ni,n 2 ] is denormalized for simplicity. The highpass filter 
is obtained by modulation and time reversal (see (3.6.5)). This filter, with some additional 
constraints, will be shown to be the smallest regular two-dimensional filter (the counterpart 
of the Daubechies' D 2 filter [71]). Note that this cascade has its generalization in more than 
two dimensions (its one-dimensional counterpart is the lattice structure given in (3.2.60)). 
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One to Multidimensional Transformations Because of the difficulty of designing 
good niters in multiple dimensions, transformations to map one- dimensional designs 
into multidimensional ones have been used for some time, the most popular being 
the McClellan transformation [88, 191]. 

For purely discrete-time purposes, the only requirement that we impose is that 
perfect reconstruction be preserved when transforming a one-dimensional filter bank 
into a multidimensional one. We will see later, that in the context of building 
continuous-time wavelet bases, one needs to preserve the order of zeros at aliasing 
frequencies. Two methods are presented: the first is based on separable polyphase 
components and the second on the McClellan transformation. 

Separable Polyphase Components A first possible transform is obtained by designing a 
multidimensional filter having separable polyphase components, 
given as products of the polyphase components of a one- dimensional filter [11, 
47]. To be specific, consider the quincunx downsampling case. Start with a one- 
dimensional filter having polyphase components Hq(z) and H\{z), that is, a filter 
with a z-transform H(z) = Hq(z 2 ) + z~ 1 H\(z 2 ). Derive separable polyphase com- 
ponents 

H i (z 1 ,z 2 ) = Hi(zi) Hi(z 2 ), i = 0, 1. 

Then, the two-dimensional filter with respect to the quincunx lattice is given as (by 
upsampling the polyphase components with respect to the quincunx lattice) 

H(zi,z 2 ) = H (ziz 2 ) Hoiziz^ 1 ) + z^ 1 Hi(ziz 2 ) Hi(ziZ2 1 ). 

It can be verified that an ./Vth-order zero at tt in H{e 3U) ), maps into an iVth-order 
zero at (7r,7r) in H{e 3u)1 , e- 7 ^ 2 ) (we will come back to this property in Chapter 4). 
However, an orthogonal filter bank is mapped into an orthogonal two-dimensional 
bank, if and only if the polyphase components of the one-dimensional filter are 
allpass functions (that is, Hi(e :,UJ )Hi(e~ :,u; ) = c). Perfect reconstruction is thus 
not conserved in general. Note that the separable polyphase components lead to 
efficient implementations, reducing the number of operations from 0[£ 2 ] to 0[L] 
per output, where L is the filter size. 

McClellan Transformation [191] The second transformation is the well-known Mc- 
Clellan transformation, which has recently become a popular way to design linear 
phase multidimensional filter banks (see [47, 163, 257, 288] among others). The 
Fourier transform of a zero-phase symmetric filter [h[n] = h[—n]), can be written 
as a function of cos (nu ) [211] 



H(lu) = 2_j a [ n \ cos(no;) 
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where a[0] = h[0] and a[n] = 2h[n], n / 0. Using Tchebycheff polynomials, one can 
replace cos(no;) by T n [cos(ui)], where T n [.] is the nth Tchebycheff polynomial, and 
thus H{u) can be written as a polynomial of cos(u;) 

L 
H{lo) = y, a W T n [cos(w)]. 

n=—L 

The idea of the McClellan transformation is to replace cos(u;) by a zero-phase two- 
dimensional filter F(uj\,uj2)- This results in an overall zero-phase two-dimensional 

filter [88, 191] 

L 

H(ui,u 2 ) = ^2 a N T n [F(uJi,uj2)}- 

n=—L 

In the context of filter banks, this transformation can only be applied to the 
biorthogonal case (because of the zero-phase requirement). Typically, in the case 
of quincunx downsampling, F(ui,U2) is chosen as [57] 

F{uji,u)2) = -(cos(cji) + cos{u)2))- (3.6.8) 

That the perfect reconstruction is preserved, can be checked by considering the 
determinant of the polyphase matrix. This is a monomial in the one- dimensional 
case since one starts with a perfect reconstruction filter bank. The transforma- 
tion in (3.6.8) leads to a determinant which is also a monomial, and thus, perfect 
reconstruction is conserved. 

In addition to this, it is easy to see that pairs of zeroes at it (factors of the form 
1 + cos(tf)) map into zeroes of order two at (7T,7r) in the transformed domain (or 
factors of the form 1 + cos(o>i)/2 + cos(w2)/2). 

Therefore, the McClellan transformation is a powerful method to map one-dim- 
ensional biorthogonal solutions to multidimensional biorthogonal solutions, and this 
while conserving zeroes at aliasing frequencies. We will show how important this is 
in trying to build continuous-time wavelet bases. 

Remarks We have given a rapid overview of multidimensional filter bank results 
and relied on simple examples in order to give the intuition rather than developing 
the full algebraic framework. We refer the interested reader to [47, 48, 160, 163, 308], 
among others, for more details. 

3.7 Transmultiplexers and Adaptive Filtering in Subbands 
3.7.1 Synthesis of Signals and Transmultiplexers 

So far, we have been mostly interested in decomposing a given signal into com- 
ponents, from which the signal can be recovered. This is essentially an analysis 
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problem. 

The dual problem is to start from some components and to synthesize a signal 
from which the components can be recovered. This has some important appli- 
cations, in particular in telecommunications. For example, several users share a 
common channel to transmit information. Two obvious ways to solve the problem 
are to either multiplex in time (each user receives a time slot out of a period) or 
multiplex in frequency (each user gets a subchannel). In general, the problem can 
be seen as one of designing (orthogonal) functions that are assigned to the different 
users within a time window so that each user can use "his" function for signal- 
ing (for example, by having it on or off). Since the users share the channel, the 
functions are added together, but because of orthogonality, 13 each user can mon- 
itor "his" function at the receiving end. The next time period looks exactly the 
same. Therefore, the problem is to design an orthogonal set of functions over a 
window, possibly meeting some boundary constraints as well. Obviously, time- and 
frequency- division multiplexing are just two particular cases. 

Because of the fact that the system is invariant to shifts by a multiple of the 
time window, it is also clear that, in discrete time, this is a multirate filter bank 
problem. Below, we describe briefly the analysis of such systems, which is very 
similar to its dual problem, as well as some applications. 

Analysis Of Transmultiplexers A device synthesizing a single signal from sev- 
eral signals, followed by the inverse operation of recovering the initial signals, is 
usually called a transmultiplexer. This is because a main application is in telecom- 
munications for going from time-division multiplexing (TDM) to frequency-division 
multiplexing (FDM) [25]. Such a device is shown in Figure 3.21. 

It is clear that since this scheme involves multirate analysis and synthesis filter 
banks, all the algebraic tools developed for analysis/synthesis systems can be used 
here as well. We will not go through the details, since they are very similar to the 
familiar case, but will simply discuss a few key results [316]. 

It is easiest to look at the polyphase decomposition of the two filter banks, shown 
in Figure 3.21(b). The definitions of H p {z) and G p {z) are as given in Section 3.2. 
Note that they are of sizes N x M and M x N, respectively. It is clear that the two 
polyphase transforms in the middle of the system cancel each other, and therefore, 
defining the input vector as 

x(z) = (X (2)X 1 (z)...X JV _ 1 (z)) T , 



and similarly the output vector as 

x(z) = (X (z) X 1 {z)...X N _ 1 {z)) 
Orthogonality is not necessary, but makes the system simpler. 
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Figure 3.21 Transmultip lexer, (a) General scheme, (b) Polyphase-domain 
implementation . 



we have the following input/output relationship: 

x(z) = H p (z) G p {z) x(z). (3.7.1) 

We thus immediately get the following result: 

Proposition 3.18 

In a transmultiplexer with polyphase matrices H p {z) and G p (z), the following 
holds: 



(a) Perfect reconstruction is achieved if and only if H p {z)G p {z) = I. 

(b) There is no crosstalk between channels if and only if H p {z)G p {z) is 
diagonal. 



3.7. TRANSMULTIPLEXERS AND ADAPTIVE FILTERING IN SUBBANDS 195 

The above result holds for any M and N. One can show that M > N is a necessary 
condition for crosstalk cancellation and perfect reconstruction. In the critical sam- 
pling case, or M = N, there is a simple duality result between transmultiplexers 
and analysis/synthesis systems seen earlier. 

Proposition 3.19 

In the critically sampled case (number of channels equal to sampling rate 
change), a perfect reconstruction subband coding system is equivalent to a 
perfect reconstruction transmultiplexer. 

Proof 

Since Gp(z)H p (z) — I and they are square, it follows that H p (z)G p (z) — I as well. 

Therefore, the design of perfect subband coding systems and of perfect transmul- 
tiplexers is equivalent, at least in theory. A problem in the transmultiplexer case 
is that the channel over which y is transmitted can be far from ideal. In order to 
highlight the potential problem, consider the following simple case: Multiplex two 
signals Xq(z) and X\{z) by upsampling by 2, delaying the second one by 2 and 
adding them. This gives a channel signal 

Y(z)=X (z 2 ) + z- 1 X 1 (z 2 ). 

Obviously, Xq{z) and X\{z) can be recovered by a polyphase transform (downsam- 
pling Y{z) by 2 yields Xq(z), downsampling zY{z) by 2 yields X\{z)). However, 
if Y{z) has been delayed by z _1 , then the two signals will be interchanged at the 
output of the transmultiplexer. A solution to this problem is obtained if the signals 
Xq(z 2 ) and X\(z 2 ) are filtered by perfect lowpass and highpass filters, respectively, 
and similarly at the reconstruction. Therefore, transmultiplexers usually use very 
good bandpass filters. In practice, critical sampling is not attempted. Instead, 
N signals are upsampled by M > N and filtered by good bandpass filters. This 
higher upsampling rate allows guard bands to be placed between successive bands 
carrying the useful signals and suppresses crosstalk between channels even without 
using ideal filters. Note that all filter banks used in transmultiplexers are based on 
modulation of a prototype window to an evenly spaced set of bandpass filters, and 
can thus be very efficiently implemented using FFT's [25] (see also Section 6.2.3). 

3.7.2 Adaptive Filtering in Subbands 

A possible application of multirate filter banks is in equalization problems. The 
purpose is to estimate and apply an inverse filter (typically, a nonideal channel has 
to be compensated). The reason to use a multirate implementation rather than a 
direct time-domain version is related to computational complexity and convergence 
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behavior. Since a filter bank computes a form of frequency analysis, subband adap- 
tive filtering is a version of frequency- domain adaptive filtering. See [263] for an 
excellent overview on the topic. 

We will briefly discuss a simple example. Assume that a filter with z-transform 
F{z) is to be implemented in the subbands of a two-channel perfect reconstruction 
filter bank with critical sampling. Then, it can be shown that the channel transfer 
function between the analysis and synthesis filter banks, C(z), is not diagonal in 
general [112]. That is, one has to estimate four components, two direct channel 
components, and two crossterms. These components can be relatively short (es- 
pecially the crossterms) and run at half the sampling rate, and thus, the scheme 
can be computationally attractive. Yet, the crossterms turn out to be difficult to 
estimate accurately (they correspond to aliasing terms). Therefore, it is more in- 
teresting to implement an oversampled system, that is, decompose into N channels 
and downsample by M < N. Then, the matrix C(z) can be well approximated by 
a diagonal matrix, making the estimation of the components easier. We refer to 
[112, 263], and to references therein for more details and discussions of applications 
such as acoustic echo cancellation. 

APPENDIX 3. A Lossless Systems 

We have seen in (3.2.60) a very simple, yet powerful factorization yielding or- 
thogonal solutions and pointed to the relation to lossless systems. Here, the aim 
is to give a brief review of lossless systems and two-channel as well as iV-channel 
factorizations. Lossless systems have been thoroughly studied in classical circuit 
theory. Many results, including factorizations of lossless matrices, can be found 
in the circuit theory literature, for example in the text by Belevitch [23]. For a 
description of this topic in the context of filter banks and detailed derivations of 
factorizations, we refer to [308]. 

The general definition of a paraunitary matrix is [309] 

H(z) H{z) = cl, c / 0, 

where H(z) = H^ {z~ l ) and subscript * means conjugation 14 of the coefficients (but 
not of z). If all entries are stable, such a matrix is called lossless. The interpretation 
of losslessness, a concept very familiar in classical circuit theory [23], is that the 
energy of the signals is conserved through the system given by H{z). Note that 
the losslessness of H(z) implies that H(e JUJ ) is unitary 

H*(e> u ) H{e jw ) = cl, 



14 Here we give the general definition, which includes complex-valued filter coefficients, whereas 
we considered mostly the real case in the main text. 
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where the superscript * stands for hermitian conjugation (note that H*{e :iu ) = 
Hj, (e _JW )). For the scalar case (single input /single output), lossless transfer func- 
tions are allpass filters given by [211] 

where k = deg(a(z)) (possibly, there is a multiplicative delay and scaling factor 
equal to cz~ k ). Thus, to any zero at z = a corresponds a pole at z = I /a*, that 
is, at a mirror location with respect to the unit circle. This guarantees a perfect 
transmission at all frequencies (in amplitude) and only phase distortion. It is easy 
to verify that (3.A.1) is lossless (assuming all poles inside the unit circle) since 

z K a{z) z K a-t{z L ) 

Obviously, nontrivial scalar allpass functions are IIR, and are thus not linear phase. 
Interestingly, matrix allpass functions exist that are FIR, and linear phase behavior 
is possible. Trivial examples of matrix allpass functions are unitary matrices, as 
well as diagonal matrices of delays. 

3.A.1 Two-Channel Factorizations 

We will first give an expression for the most general form of a 2 x 2 causal FIR lossless 
system of an arbitrary degree. Then, based on this, we will derive a factorization 
of a lossless system (already given in (3.2.60)). 

Proposition 3.20 

The most general causal, FIR, 2x2 lossless system of arbitrary degree and 
real coefficients, can be written in the form [309] 

L (z) L 2 {z) \ _ f L (z) cz- K l x {z) 
L x {z) L 3 (z) J "~ V M*) -cz- K L (z) 

where Lq(z) and L\{z) satisfy the power complementary property, c is a real 
scalar constant with \c\ = 1, and K is a large enough positive integer so as to 
make the entries of the right column in (3. A. 2) causal. 

Proof 

Let us first demonstrate the following fact: If the polyphase matrix is orthogonal, then Lq 
and L\ are relatively prime. Similarly, L 2 and L 3 are relatively prime. Let us prove the 
first statement (the second one follows similarly). Expand L(z)L(z) as follows: 

Lo(z)L (z) + Li(z)Li^) = 1, (3.A.3) 

Lo(z)L 2 (z) + L 1 (z)L 3 {z) = 0, (3.A.4) 

L 2 (z)Lo(z) + L 3 {z)L 1 (z) = 0, (3.A.5) 

L 2 {z)L 2 (z) + l 3 {z)L 3 (z) = 1. (3.A.6) 



l(z) = [ ; u ):^ ; 2 ):^ ) = ( x)^ 1-kt ,1 ) . (3-A.2) 
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Suppose now that Lq and L\ are not coprime, and call their common factor P(z), that is, 
L (z) = P(z)L' Q (z), Li(z) = P(z)L' 1 (2). Substituting this into (3. A. 3) 

P(z)P(«) • (Ii(z)L£(*) + Li(«)^i(«)) = 1. 

which for all zeros of P(z) goes to 0, contradicting the fact the right side is identically 1. 

Consider (3. A. 4). Since Lo and Li, as well as L2 and L3 are coprime, we have that 
Ls(z) — C\z~ K Lo(z) and £2(2) — C-zz~ K L\{z) where K and K' are large enough integers 
to make L3 and L2 causal. Take now (3. A. 5). This implies that K — K' and C\ = — C%. 
Finally, (3.A.3) or (3. A. 6) imply that d = ±1. 

To obtain a cascade- form realization of (3. A. 2), we find such a realization for 
the left column of (3. A. 2) and then use it derive a cascade form of the whole matrix. 
To that end, a result from [309] will be used. It states that for two, real-coefficient 
polynomials Pr-i and Qk-i of degree (K — 1), withp^-_i(0) pk-i(K-I) / (and 
Pr-i, Qk-i are power complementary), there exists another pair Pk-2, Qk-2 such 
that 

Pk~i(z) \ / cos a —sin a \ / Pk-2(z) 



Qk-i(z) 



cos a 
sin a 



- sin a 

cos a 



!K-2 



(z) 



(3.A.7) 



Repeatedly applying the above result to (3. A. 2) one obtains the lattice factorization 
given in (3.2.60), that is, 



L (z) L 2 (z) 
Li(z) L 3 (z) 



cos ao — sin ao 
sin «o cos a o 

K-l 



n 



i 



z 



-1 



cos oti — sm a-i 
sin a,- cos a,- 



A very important point is that the above structure is complete, that is, all orthog- 
onal systems with filters of length 2K can be generated in this fashion. The lattice 
factorization was given in Figure 3.6. 

3.A.2 Multichannel Factorizations 

Here, we will present a number of ways in which one can design TV-channel or- 
thogonal systems. Some of the results are based on lossless factorizations (for 
factorizations of unitary matrices, see Appendix 2.B in Chapter 2). 



Givens Factorization We have seen in Appendix 3. A. I a lattice factorization for 
the two-channel case. Besides delays, the key building blocks were 2x2 rotation 
matrices, also called Givens rotations. An extension of that construction, holds in 
the Af-channel case as well. More precisely, a real lossless FIR matrix L(z) of size 
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Figure 3.22 Factorization of a lossless matrix using Givens rotations (after 
[306]). (a) General lossless transfer matrix H(z) of size N x N . (b) Constrained 
orthogonal matrix for U\,..., Uk-i, where each cross represents a rotation as 
in (3.A.7). 



N x N can be written as [306] 

L(z) 



Uc 



K-l 



n D i(z)Ut 



z=i 



(3.Ai 



where C/i . . . U k-i ar e special orthogonal matrices as given in Figure 3.22(b) (each 
cross is a rotation as in (3.A.7)). Uq is a general orthogonal matrix as given in 
Figure 2.13 with n = N, and D{z) are delay matrices of the form 

D(z) = diag(z" 1 1 1...1). 

Such a general, real, lossless, FIR, AMnput AT-output system, is shown in Fig- 
ure 3.22(a). Figure 3.22(b) indicates the form of the matrices U\ . . . U k-i- Note 
that Uq is characterized by ( 2 ) rotations [202] while the other orthogonal matrices 
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are characterized by N — 1 rotations. Thus, a real FIR lossless system of degree 
K — 1 has the following number of free parameters: 

p = ( K -l)(N-l)+(^ 

It is clear that these structures are lossless, and the completeness is demonstrated 
in [85]. In order to obtain good filters, one can optimize the various angles in 
the rotation matrices, derive the filters corresponding to the resulting polyphase 
matrix, and evaluate an objective cost function measuring the quality of the filters 
(such as the stopband energy). 

Householder Factorization An alternative representation of FIR lossless sys- 
tems based on products of Householder matrices, which turns out to be more con- 
venient for optimization, was presented in [312]. There it is shown that an N x N 
causal FIR system of degree K — 1 is lossless if and only if it can be written in the 
form 

Ljv_i(«) = Vjr-i^-Vjf-^-Vi^Lo, 

where Lo is a general N x N unitary matrix (see Appendix 2.B) and 

V k (z) = (I -(1 -*">*«*), 

with Vk a size-iV vector of unit norm (recall that superscript * denotes hermitian 
conjugation). It is easy to verify that V k {z) is lossless, since 

Vt{z- l )V k {z) = (I - (1 - z ) VkV * k ) . (I - (1 - z -i) VkV * k ) 

= I + v k v%((z - 1) + (z- 1 - 1) + (1 - z)(l - z- 1 )) 

= I, 

where we used v k v k VkV k = v k v k , and for the completeness issues, we refer to 
[312]. Note that these structures can be extended to the IIR case as well, sim- 
ply by replacing the delay element z _1 with a first-order scalar allpass section 
(1 — az~ l ) / (z _1 — a*). Again, it is easy to verify that such structures are lossless 
(assuming \a\ > 1) and completeness can be demonstrated similarly to the FIR 
case. 

Orthogonal and Linear Phase Factorizations Recently, a factorization for a 
large class of paraunitary, linear phase systems has been developed [275]. It is a 
complete factorization for linear phase paraunitary filter banks with an even number 
of channels TV (N > 2) where the polyphase matrix is described by the following 
[321] (see also (3.2.69)) 

H p {z) = z~ L a Hpiz' 1 ) J, (3.A.9) 
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where a is the diagonal matrix of symmetries (+1 for a symmetric filter and —1 
for an antisymmetric filter), L is the filter length and J is an antidiagonal matrix. 
Note that there exist linear phase systems which cannot be described by (3. A. 9) 
but many useful solutions do satisfy it. The cascade is given by 
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is a unitary matrix. Sq, S\ are unitary matrices of size N/2, 



W 



-i 1 
V2\I 



D{z) 



and Ui are all size-(A^/2) unitary matrices. Note that all subblocks in the above 
matrices are of size N/2. In the same paper [275], the authors develop a cascade 
structure for filter banks with an odd number of channels as well. 

State-Space Description It is interesting to consider the lossless property in 
state-space description. If we call v[n] the state vector, then a state space descrip- 
tion is given by [150] 

u[n+l] = j4v[n] + Bx[n], 
y[n] = Cv[n] + Dx[n], 

where A is of size d x d (d> K — 1, the degree of the system), D of size M x N, 
C of size M x d and B of size d x N. A minimal realization satisfies d = K — 1. 
The transfer function matrix is equal to 

H{z) = D + C{zI-A) l B, 

and the impulse response is given by 

[D,CB,CAB,CA 2 B,...]. 

The fundamental nature of the losslessness property appears in the following result 
[304, 309]: A stable transfer matrix H{z) is lossless if and only if there exists a 
minimal realization such that 



R 



A B 
C D 
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is unitary. This gives another way to parametrize lossless transfer function matrices. 
In particular, H{z) will be FIR if A is lower triangular with a zero diagonal, and 
thus, it is sufficient to find orthogonal matrices with an upper right triangular corner 
of size K — 1 with only zeros to find all lossless transfer matrices of a given size and 
degree [85]. 

APPENDIX 3.B Sampling in Multiple Dimensions and Multirate Operations 

Sampling in multiple dimensions is represented by a lattice, defined as the set 
of all linear combinations of n basis vectors a±, a,2, ■ ■ ■ , a n , with integer coefficients 
[42, 86], that is, a lattice is the set of all vectors generated by Dk, k £ Z^ , where 
D is the matrix characterizing the sampling process. Note that D is not unique 
for a given sampling pattern and that two matrices representing the same sampling 
process are related by a linear transformation represented by a unimodular matrix 
[42]. We will call input and output lattice the set of points reached by k and Dk, 
respectively. The input lattice is often Z\ (like above) but need not be. 

A separable lattice is a lattice that can be represented by a diagonal matrix 
and it will appear when one-dimensional systems are used in a separable fashion 
along each dimension. The unit cell is a set of points such that the union of copies 
of the output lattice shifted to all points in the cell yields the input lattice. The 
number of input lattice points contained in the unit cell represents the reciprocal 
of the sampling density and is given by N = det(D). An important unit cell is 
the fundamental parallelepiped U c (the parallelepiped formed by n basis vectors). 
In what follows UT will denote the fundamental parallelepiped of the transposed 
lattice. Shifting the origin of the output lattice to any of the points of the input 
lattice yields a coset. Clearly there are exactly N distinct cosets obtained by shifting 
the origin of the output lattice to all of the points of the parallelepiped. The union 
of all cosets for a given lattice yields the input lattice. 

Another important notion is that of the reciprocal lattice [42, 86]. This lattice 
is actually the Fourier transform of the original lattice, and its points represent the 
points of replicated spectrums in the frequency domain. If the matrix corresponding 
to the reciprocal lattice is denoted by D r , then D r D — I. Observe that the 
determinant of the matrix D represents the hypervolume of any unit cell of the 
corresponding lattice, as well as the reciprocal of the sampling density. One of the 
possible unit cells is the Voronoi cell which is actually the set of points closer to the 
origin than to any other lattice point. The meaning of the unit cell in the frequency 
domain is extremely important since if the signal to be sampled is bandlimited to 
that cell, no overlapping of spectrums will occur and the signal can be reconstructed 
from its samples. 

Let us now examine multidimensional counterparts of some operations involving 
sampling that are going to be used later. First, downsampling will mean that the 



3.B. SAMPLING IN MULTIPLE DIMENSIONS AND MULTIRATE OPERATIONS 203 

points on the sampling lattice are kept while all the others are discarded. The 
time- and Fourier- domain expressions for the output of a downsampler are given 

by [86, 325] 

y[n] = x[Dn], 

where TV = det(D), uj is an n-dimensional real vector, and n, k are n-dimensional 
integer vectors. 

Next consider upsampling, that is, the process that maps a signal on the input 
lattice to another one that is nonzero only at the points of the sampling lattice 

r , f xlD^n] if n = Dk, 

y\n\ — < ' ,, 

[ (J otherwise, 

Let us finish this discussion with examples often encountered in practice. 

Example 3.17 Separable Case: Sampling by 2 in Two Dimensions 

Let us start with the separable case with sampling by 2 in each of the two dimensions. The 
sampling process is then represented by the following matrix: 

Ds= [I ° ) = 21. (3.B.1) 

The unit cell consists of the following points: 

(ru ) n a )e{(0,0),(l,0),(0,l),(l,l)}. 

In z-domain, these correspond to 

{1,2 1 " 1 ,2^ 1 ,(Z1Z 2 )" 1 }. 

Its Voronoi cell is a square and the corresponding critically sampled filter bank will have 
N = det(-D) = 4 channels. This is the case most often used in practice in image coding, 
since it represents separable one-dimensional treatment of an image. Looking at it this way 
(in terms of lattices) , however, will give us the additional freedom to design nonseparable 
filters even if sampling is separable. The expression for upsampling in this case is 

y(wi,w 2 ) = X(2wi,2w 2 ), 
while downsampling followed by upsampling gives 

Y(lUi,LU2) — -(X(C0 1 ,L02) + X(u>l + TV, L0 2 ) + X(u>l, L0 2 + Tv) + X (iOl + TV,Ll> 2 + 71")), 

that is, samples where both n, and n 2 are even are kept, while all others are put to zero. 
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Example 3.18 Quincunx Sampling 

Consider next the quincunx case, that is, the simplest multidimensional sampling structure 
that is nonseparable. It is generated using, for example, 

DQ = ( J _j ) ■ (3-B.2) 

Since its determinant equals 2, the corresponding critically sampled filter bank will have two 
channels. The Voronoi cell for this lattice is a diamond (tilted square). Since the reciprocal 
lattice for this case is again quincunx, its Voronoi cell will have the same diamond shape. 
This fact has been used in some image and video coding schemes [12, 320] since, if restricted 
to this region, (a) the spectrums of the signal and its repeated occurrences that appear due 
to sampling will not overlap and (b) due to the fact that the human eye is less sensitive 
to resolution along diagonals, it is more appropriate for the lowpass filter to have diagonal 
cutoff. Note that the two vectors belonging to the unit cell are 

0\ (\ 



'"" ~ l J ' ni ~\0/' 

while their z-domain counterparts are 1 and z^ and are the same for the unit cell of the 
transposed lattice. Shifting the origin of the quincunx lattice to points determined by the 
unit cell vectors yields the two cosets for this lattice. Obviously, their union gives back the 
original lattice. Write now the expression for the output of an upsampler in Fourier domain 

y(wi,u>2) = X(coi + uj2,u)i — u) 2 ). 

Similarly, the output of a downsampler followed by an upsampler can be expressed as 

Y(Wl,U>2) = ~(X(UJ1,UJ2) + X(uJl + TY,LU 2 +7r)). 

It is easy to see that all the samples at locations where (ni + n-i) is even are kept, while 
where (ri\ + n 2 ) is odd, they are put to zero. 
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Problems 

3.1 Orthogonality and completeness of the sine basis (Section 3.1.3): 

(a) Prove the orthogonality relations (3.1.27) and (3.1.28). 

(b) Prove that the set {<fik} given in (3.1.24) is complete in fa(Z). Hint: Use the same 
argument as in Proposition 3.1. Take first the even terms and find the Fourier trans- 
form of ((p2fc[n],a;[n]} = 0. Do the same for the odd terms. Combining the two, you 
should get || as || =0 violating the assumption and proving completeness. 

3.2 Show that go[n] = 1/V% sin(("7r/2)n)/((7r/2)n) and gi[n] — (— l) n go[— n] and their even 
translates do not form an orthogonal basis for h(Z), that is, the shift by 1 in (3.1.24) is 
necessary for completeness. Hint: Show incompleteness by finding a counterexample based 
on sin((7r/2)n) with proper normalization. 

3.3 Show that Proposition 3.3 does not hold in the nonorthogonal case, that is, there exist 
nonorthogonal time-invariant expansions with frequency selectivity. 

3.4 Prove the equivalences of (a)-(e) in Theorem 3.7. 

3.5 Based on the fact that in an orthogonal FIR filter bank, the autocorrelation of the lowpass 
filter satisfies P(z) + P( — z) — 2, show that the length of the filter has to be even. 

3.6 For A(z) = (l + zfil + z' 1 ) 3 , verify that B(z) = l/256(3.z 2 - 18^ + 38- 18z _1 +3z~ 2 ) is the 
solution such that P(z) = A(z) B(z) is valid. If you have access to adequate software (for 
example, Matlab), do the spectral factorization (obviously, only B(z) needs to be factored). 
Give the filters of this orthogonal filter bank. 

3.7 Prove the equivalences (a)-(e) in Theorem 3.8. 

3.8 Prove the three statements on the structure of linear phase solutions given in Proposition 
3.11. Hint: Use P(z) = Ho(z) Gq{z) — z~ k Ho(z) Hi( — z), and determine when it is valid. 

3.9 Show that, when the filters Ho(z) and H\(z) are of the same length and linear phase, the 
linear phase testing condition given by (3.2.69), holds. Hint: Find out the form of the 
polyphase components of each linear phase filter. 

3.10 In Proposition 3.12, it was shown that there are no real symmetric/antisymmetric orthogonal 
FIR filter banks. 

(a) Show that if the filters can be complex valued, then solutions exist. 

(b) For length-6 filters, find the solution with a maximum numbers of zeros at lo — n. 
Hint: Refactor the P(z) that leads to the D$ filter into complex-valued symmet- 
ric/antisymmetric filters. 

3.1 1 Spectral factorization method for two-channel filter banks: Consider the factorization of P{z) 
in order to obtain orthogonal or biorthogonal filter banks. 

(a) Take 

P(z) = -l/4z 3 + 1/2Z + 1 + 1/2Z' 1 - l/4z~ 3 . 

Build an orthogonal filter bank based on this P(z). If the function is not positive on 
the unit circle, apply an adequate correction (see Smith-Barnwell method in Section 
3.2.3). 
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(b) Alternatively, compute a linear phase factorization of P(z). In particular, choose 
Hq(z) — z + 1 + z _1 . Give the other filters in this biorthogonal filter bank. 

(c) Assume now that a particular P(z) was designed using the Parks-McClellan algorithm 
(which leads to equiripple pass and stopbands). Show that if P(z) is not positive on 
the unit circle, then the correction to make it greater or equal to zero places all 
stopband zeros on the unit circle. 

3.12 Using Proposition 3.13, prove that the filter Hq(z) = (l+z~ l ) N has always a complementary 
filter. 

3.13 Prove that in the orthogonal lattice structure, the sum of angles has to be equal to tt/4 
or 57r/4 in order to have one zero at to — n in Ho(e :>u '). Hint: There are several ways to 
prove this, but an intuitive one is to consider the sequence x[n] — ( — 1)" at the input, or, 
to consider z-transforms at z — e juJ — — 1. See also Example 3.3. 

3.14 Interpolation followed by decimation: Given an input x[n], consider upsampling by 2, fol- 
lowed by interpolation with a filter having z-transform H(z) for magnification of the signal. 
Then, to recover the original signal size, apply filtering by a decimation filter G(z) followed 
by downsampling by 2, in order to obtain a reconstruction x[n\. 

(a) What does the product filter P(z) — H(z) ■ G(z) have to satisfy in order for x[n] to 
be a perfect replica of x[n] (possibly with a shift). 

(b) Given an interpolation filter H(z), what condition does it have to satisfy so that one 
can find a decimation filter G(z) in order to achieve perfect reconstruction. Hint: 
This is similar to the complementary filter problem in Section 3.2.3. 

(c) For the following two filters, 

H'(z) = 1 + z" 1 + z~ 2 + z~ 3 , H"(z) = 1 + z' 1 + z~ 2 + z~ 3 + z~ 4 , 

give filters G'(z) and G"(z) so that perfect reconstruction is achieved (if possible, give 
shortest such filter, if not, say why). 

3.15 Prove the orthogonality relations (3.3.16) and (3.3.17) for an octave-band filter bank, using 
similar arguments as in the proof of (3.3.15). 

3.16 Consider tree-structured orthogonal filter banks as discussed in Example 3.10, and in par- 
ticular the full tree of depth 2. 

(a) Assume ideal sine filters, and give the frequency response magnitude of G i0 (e?"), i — 
0, . . . , 3. Note that this is not the natural ordering one would expect. 

(b) Now take the Haar filters, and give g\ [n],i — 0, . . . , 3. These are the discrete-time 
Walsh-Hadamard functions of length 4. 

(c) Given that {goM,</iM} is an orthogonal pair, prove orthogonality for any of the 
equivalent filters with respect to shifts by 4. 

3.17 In the general case of a full-grown binary tree of depth J, define the equivalent filters such 
that their indexes increase as the center frequency increases. In Example 3.10, it would 
mean interchanging G 3 with G 2 (see (3.3.23)). 
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3.18 Show that in a filter bank with linear phase filters, the iterated filters are also linear phase. 
In particular, consider the case where ho[n] and hi[n] are of even length, symmetric and 
antisymmetric respectively. Consider a four-channel bank, with H a {z) = Hq(z)Ho(z ), 
H b (z) = H (z)H 1 {z 2 ), H c {z) = H 1 (z)H (z 2 ), and H d (z) = H 1 (z)H 1 {z 2 ). What are the 
lengths and symmetries of these four filters? 

3.19 Consider a general perfect reconstruction filter bank (not necessary orthogonal). Build a 
tree-structured filter bank. Give and prove the biorthogonality relations for the equivalent 
impulse responses of the analysis and synthesis filters. For simplicity, consider a full tree of 
depth 2 rather than an arbitrary tree. Hint: The method is similar to the orthogonal case, 
except that now analysis and synthesis filters are involved. 

3.20 Prove that the number of wavelet packet bases generated from a depth-J binary tree is 
equal to (3.3.25). 

3.21 Prove that the perfect reconstruction condition given in terms of the modulation matrix for 
the iV-channel case, is equivalent to the system being biorthogonal. Hint: Mimic the proof 
for the two-channel case given in Section 3.2.1. 

3.22 Give the relationship between G p (z) and G m (z), which is similar to (3.4.9), as well as 
between H p (z) and H m (z) and this in the general iV-channel case. 

3.23 Consider a modulated filter bank with filters Hq(z) = H(z), H-i(z) = H(Ws,z), and H^{z) — 
H{W£z). The modulation matrix H m (z) is circulant. (Note that VK 3 = e~ j2w/a ). 

(a) Show how to diagonalize H m (z). 

(b) Give the form of the determinant det(H m (z)). 

(c) Relate the above to the special form of H p (z). 

3.24 Cosine modulated filter banks: 

(a) Prove that (3.4.5-3.4.6) hold for the cosine modulated filter bank with filters given in 
(3.4.18) and h pT [n] = l,n = 0, . . . , 2N - 1. 

(b) Prove that in this case (3.4.23) holds as well. 

Hint: Show that left and right tails are symmetric/antisymmetric, and thus the tails are 
orthogonal. 

3.25 Orthogonal pyramid: Consider a pyramid decomposition as discussed in Section 3.5.2 and 
shown in Figure 3.17. Now assume that h[n] is an "orthogonal" filter, that is, {h[n],h[n — 
21]) — 5[. Perfect reconstruction is achieved by upsampling the coarse version, filtering it 
by h, and adding it to the difference signal. 

(a) Analyze the above system in time domain and in ^-transform domain, and show 
perfect reconstruction. 

(b) Take h[n] = (l/-\/2)[l, 1]. Show that yi [n] can be filtered by (1/a/2)[1, -1] and 
downsampled by 2 while still allowing perfect reconstruction. 

(c) Show that (b) is equivalent to a two-channel perfect reconstruction filter bank with 
filters h [n] = (l/v%)[l, 1] and hi[n] = (1/V2)[1, -1]. 
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(d) Show that (b) and (c) are true for general orthogonal lowpass filters, that is, yi[n] can 
be filtered by g[n] — (— l) n h[— n + L — 1] and downsampled by 2, and reconstruction 
is still perfect using an appropriate filter bank. 

3.26 Verify Parseval's formula (3.5.3) in the tight frame case given in Section 3.5.1. 

3.27 Consider a two-dimensional two-channel filter bank with quincunx downsampling. Assume 
that Ho(zi,Z2) and H\(z\,z-2) satisfy (3.6.4-3.6.5). Show that their impulse responses with 
shifts on a quincunx lattice form an orthonormal basis for l2(Z € ). 

3.28 Linear phase diamond- shaped quincunx filters: We want to construct a perfect reconstruc- 
tion linear phase filter bank for quincunx sampling and the matrix 



D 



To that end, we start with the following filters ho [111,712] and hi[ni,n?\ 

I b \ 

h [ni,n 2 ] = 1 a 1 , 



h 1 [n 1 ,n 2 ] — 



fe+f a b+- 

c d c 

fe+f a b+- 

1 



\ 



where the origin is where the leftmost coefficient is. 

(a) Using the sampling matrix above, identify the polyphase components and verify that 
perfect FIR reconstruction is possible (the determinant of the polyphase matrix has 
to be a monomial). 

(b) Instead of only having top-bottom, left-right symmetry, impose circular symmetry on 
the filters. What are b, c? If a — —4,d — — 28, what type of filters do we obtain 
(lowpass/highpass) ? 



Series Expansions Using Wavelets 
and Modulated Bases 



"All this time, the guard was looking at her, first 

through a telescope, then through a microscope, 

and then through an opera glass" 

- Lewis Carroll, Through the Looking Glass 



s. 



'eries expansions of continuous-time signals of functions go back at least to 
Fourier's original expansion of periodic functions. The idea of representing a signal 
as a sum of elementary basis functions or equivalently, to find orthonormal bases 
for certain function spaces, is very powerful. However, classic approaches have lim- 
itations, in particular, there are no "good" local Fourier series that have both good 
time and frequency localization. 

An alternative is the Haar basis where, in addition to time shifting, one uses 
scaling instead of modulation in order to obtain an orthonormal basis for Li2(R) 
[126]. This interesting construction was somewhat of a curiosity (together with 
a few other special constructions) until wavelet bases were found in the 1980's 
[71, 180, 194, 21, 22, 175, 283]. Not only are there "good" orthonormal bases, but 
there also exist efficient algorithms to compute the wavelet coefficients. This is due 
to a fundamental relation between the continuous-time wavelet series and a set of 
(discrete-time) sequences. These correspond to a discrete-time filter bank which 
can be used, under certain conditions, to compute the wavelet series expansion. 
These relations follow from multiresolution analysis; a framework for analyzing 
wavelet bases [180, 194]. The emphasis of this chapter is on the construction of 
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wavelet series. We also discuss local Fourier series and the construction of local 
cosine bases, which are "good" modulated bases [61]. Note that in this chapter we 
construct bases for L2(1Z); however, these bases have much stronger characteristics 
as they are actually unconditional bases for L p spaces, 1 < p < oo [73]. 

The development of wavelet orthonormal bases has been quite explosive in the 
last decade. While the initial work focused on the continuous wavelet transform 
(see Chapter 5), the discovery of orthonormal bases by Daubechies [71], Meyer 
[194], Battle [21, 22], Lemarie [175], Stromberg [283], and others, lead to a wealth 
of subsequent work. 

Compactly supported wavelets, following Daubechies' construction, are based 
on discrete-time filter banks, and thus many filter banks studied in Chapter 3 
can lead to wavelets. We list below, without attempting to be exhaustive, a few 
such constructions. Cohen, Daubechies and Feaveau [58] and Vetterli and Her- 
ley [318, 319] considered biorthogonal wavelet bases. Bases with more than one 
wavelet were studied by Zou and Tewfik [343, 344], Steffen, Heller, Gopinath and 
Burrus [277], and Soman, Vaidyanathan and Nguyen [275], among others. Mul- 
tidimensional, nonseparable wavelets following from filter banks were constructed 
by Cohen and Daubechies [57] and Kovacevic and Vetterli [163]. Recursive filter 
banks leading to wavelets with exponential decay were derived by Herley and Vet- 
terli [133, 130]. Rioul studied regularity of iterated filter banks [239], complexity of 
wavelet decomposition algorithms [245], and design of "good" wavelet filters [246]. 
More constructions relating filter banks and wavelets can be found, for example, in 
the work of Akansu and Haddad [3, 4], Blu [33], Cohen [55], Evangelista [96, 95], 
Gopinath [115], Herley [130], Lawton [170, 171], Rioul [240, 242, 243, 244] and 
Soman and Vaidyanathan [274]. 

The study of the regularity of the iterated filter that leads to wavelets was done 
by Daubechies and Lagarias [74, 75], Cohen [55], and Rioul [239] and is related to 
work on recursive subdivision schemes which was done independently of wavelets 
(see [45, 80, 87, 92]). The regularity condition and approximation property occur- 
ring in wavelets are related to the Strang-Fix condition first derived in the context 
of finite-element methods [282]. 

Direct wavelet constructions followed the work of Meyer [194], Battle [21, 22] 
and Lemarie [175]. They rely on the multiresolution framework established by 
Mallat [181, 179, 180] and Meyer [194]. In particular, the case of wavelets related 
to splines was studied by Chui [52, 49, 50] and by Aldroubi and Unser [7, 296, 297]. 
The extension of the wavelet construction for rational rather than integer dilation 
factors was done by Auscher [16] and Blu [33]. Approximation properties of wavelet 
expansions have been studied by Donoho [83], and DeVore and Lucier [82]. These 
results have interesting consequences for compression. 
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The computation of the wavelet series coefficients using filter banks was studied 
by Mallat [181, 179] and Shensa [261], among others. Wavelet sampling theorems 
are given by Aldroubi and Unser [6], Walter [328] and Xia and Zhang [340]. Local 
cosine bases were derived by Coifman and Meyer [61] (see also [17]). The wave- 
let framework has also proven useful in the context of analysis and synthesis of 
stochastic processes, see for example [20, 178, 338, 339]. 

The material in this chapter is covered in more depth in Daubechies' book [73] 
to which we refer for more details. Our presentation is less formal and based mostly 
on signal processing concepts. 

The outline of the chapter is as follows: First, we discuss series expansions in 
general and the need for structured series expansion with good time and frequency 
localization. In particular, the local Fourier series is contrasted with the Haar 
expansion and a proof that the Haar system is an orthonormal basis for L,2(R) is 
given. In Section 4.2, we introduce multiresolution analysis and show how a wavelet 
basis can be constructed. As an example, the sine (or Littlewood-Paley) wavelet is 
derived. Section 4.3 gives wavelet bases constructions in the Fourier domain, using 
the Meyer and Battle-Lemarie wavelets as important examples. Section 4.4 gives 
the construction of wavelets based on iterated filter banks. The regularity (condi- 
tions under which filter banks generate wavelet bases) of the discrete-time filters is 
studied. In particular, the Daubechies' family of compactly supported wavelets is 
given. Section 4.5 discusses some of the properties of orthonormal wavelet series 
expansions as well as the computation of the expansion coefficients. Variations on 
the theme of wavelets from filter banks are explored in Section 4.6, where biorthog- 
onal bases, wavelets based on IIR filter banks and wavelets with integer dilation 
factors greater than 2 are given. Section 4.7 discusses multidimensional wavelets 
obtained from multidimensional filter banks. Finally, Section 4.8 gives an interest- 
ing alternative to local Fourier series in the form of local cosine bases which have 
better time-frequency behavior than their Fourier counterparts. 



4.1 Definition of the Problem 

4.1 .1 Series Expansions of Continuous-Time Signals 

In the last chapter orthonormal bases were built for discrete-time sequences, that 
is, sets of orthogonal sequences {(/?£; [n]}/c g .z were found such that any signal x[n] £ 
li(£) could be written as 



oo 

x[n\ = 

k=— oo 



^2 ('Pk['m],x[m]) <p k [n], 
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where 



((p k [m],x[m]) = Yl ^N X H- 



m=— oo 



In this chapter the aim is to represent continuous-time functions in terms of a 
series expansion. We intend to find sets of orthonormal continuous-time functions 
{<£&(£)} such that signals f{t) belonging to a certain class (for example, L2(JZ)) can 
be expressed as 

oo 

f(t) = Y, (Mu),f(u))Mt), 

k=—oo 

where 

/oo 
¥»*(«) f( u ) du - 
-oo 

In other words, f(t) can be written as the sum of its orthogonal projections onto 
the basis vectors </?&(£). Beside having to meet orthonormality constraints, or 

(ip k (u),ifi(u)) = 5[k -I], 

the set {<Pk(t}} has also to be complete. Its span has to cover the space of functions 
to be represented. 

We start by briefly reviewing two standard series expansions that were studied 
in Section 2.4. The better-known series expansion is certainly the Fourier series. 
A periodic function, f(t + nT) = fit), can be written as a linear combination of 
sines and cosines or complex exponentials, as 

(4.1.1) 







/(*) 


= 


Y F[k] e*( 2 ***)/ r , 

k=— oo 


here the Ff/cJ's 


are 


the Fourier 


coefficients obtained as 






F[k] = 


1 
f 


r T/2 

/ e-^ kt ^ T fit) dt, 
J-T/2 



(4.1.2) 

that is, the Fourier transform of one period evaluated at integer multiples of ujq = 
2n/T. It is easy to see that the set of functions { e i( 27rfc *)/ r ; fc G Z, U G [-T/G, T/g]} 
is an orthogonal set, that is, 

{eJ (2«kt)/T^ { 2«lt)/T ){ _ T ^ T/2] = T5[k _ iy 

Since the set is also complete, it is an orthonormal basis for functions belonging to 
L 2 ([-T/2,T/2]) (up to a scale factor of l/VT). 
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The other standard series expansion is that of bandlimited signals (see also 
Section 2.4.5). Provided that |X(w)| = for \tu\ > lo s /2 = tt/T, then sampling 
x(t) by multiplying with Dirac impulses at integer multiples of T leads to the 
function x s {t) given by 

oo 

x s (t) = ^ x(nT) 5{t-nT). 

n=—oo 

The Fourier transform of x s (t) is periodic with period ui s and is given by (see Section 
2.4.5) 

_. oo 

X 8 (u) = -J2x(uj- ku s ). (4.1.3) 

k=— oo 

From (4.1.3) it follows that the Fourier transforms of x{t) and x s (t) coincide over the 
interval (—uj s /2,uj s /2) (up to a scale factor), that is, X{uj) = TX s (u), \u>\ < w s /2. 
Thus, to reconstruct the original signal X(iv), we have to window the sampled signal 
spectrum X s (lu), or X{lo) = G(uj)X s (uj), where G{u)) is the window function 

\ otherwise. 

Its inverse Fourier transform, 

, x . , \ sin(7rt/T) , , _ ,. 

git) = smc r (i) = \ ' , (4.1.4) 

nt/l 

is called the sine function. 1 In time domain, we convolve the sampled function x s (t) 
with the window function <?(£) to recover x(t): 

oo 
x(t) = x s {t)*g(t) = Y^ x(nT) smc T (t-nT). (4.1.5) 

n=— oo 

This is usually referred to as the sampling theorem (see Section 2.4.5). Note that 
the interpolation functions {sincT(t — nT)} ne z, form an orthogonal set, that is 

(sincT (t — mT),smcT(t — nT)) = T 8[m — n]. 

Then, since x{t) is bandlimited, the process of sampling at times nT can be written 
as 

x{nT) = — (sincx(n — nT),x(u)), 



lr The standard definition from the digital signal processing literature is used here, even if it 
would make sense to divide the sine by 1/vT to make it of unit norm. 
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or convolving x(t) with sincT(— t) and sampling the resulting function at times nT. 
Thus, (4.1.5) is an expansion of a signal into an orthogonal basis 

1 °° 
x(t) = — y, (sincT (u — nT), x(u)) siriCT (t — nT). (4.1.6) 

n=— oo 

Moreover, if a signal is not bandlimited, then (4.1.6) performs an orthogonal pro- 
jection onto the space of signals bandlimited to (—lu s /2,lu s /2) (see Section 2.4.5). 

4.1.2 Time and Frequency Resolution of Expansions 

Having seen two possible series expansions (Fourier series and sine expansion), let us 
discuss some of their properties. First, both cases deal with a limited signal space — 
periodic or bandlimited. In what follows, we will be interested in representing more 
general signals. Then, the basis functions, while having closed-form expressions, 
have poor decay in time (no decay in the Fourier series case, 1/t decay in the sine 
case). Local effects spread over large regions of the transform domain. This is often 
undesirable if one wants to detect some local disturbance in a signal which is a 
classic task in nonstationary signal analysis. 

In this chapter, we construct alternative series expansions, mainly based on 
wavelets. But first, let us list a few desirable features of basis functions [238]: 

(a) Simple characterization. 

(b) Desirable localization properties in both time and frequency, that is, appro- 
priate decay in both domains. 

(c) Invariance under certain elementary operations (for example, shifts in time). 

(d) Smoothness properties (continuity, differentiability). 

(e) Moment properties (zero moments, see Section 4.5). 

However, some of the above requirements conflict with each other and ultimately, 
the application at hand will greatly influence the choice of the basis. 

In addition, it is often desirable to look at a signal at different resolutions, that 
is, both globally and locally. This feature is missing in classical Fourier analysis. 
Such a multiresolution approach is not only important in many applications (ranging 
from signal compression to image understanding), but is also a powerful theoretical 
framework for the construction and analysis of wavelet bases as alternatives to 
Fourier bases. 

In order to satisfy some of the above requirements, let us first review how one 
can modify Fourier analysis so that local signal behavior in time can be seen even 
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in the transform domain. We thus reconsider the short-time Fourier (STFT) or 
Gabor transform introduced in Section 2.6. The idea is to window the signal (that 
is, multiply the signal by an appropriate windowing function centered around the 
point of interest), and then take its Fourier transform. To analyze the complete 
signal, one simply shifts the window over the whole time range in sufficiently small 
steps so as to have substantial overlap between adjacent windows. This is a very 
redundant representation (the signal has been mapped into an infinite set of Fourier 
transforms) and thus it can be sampled. This scheme will be further analyzed in 
Section 5.3. 

As an alternative, consider a "local Fourier series" obtained as follows: Starting 
with an infinite and arbitrary signal, divide it into pieces of length T and expand 
each piece in terms of a Fourier series. Note that at the boundary between two 
intervals the expansion will in general be incorrect because the periodization cre- 
ates a discontinuity. However, this error has zero energy, and therefore this simple 
scheme is a possible orthogonal expansion which has both a frequency index (cor- 
responding to multiples of uiq = 2ir/T) and a time index (corresponding to the 
interval number, or the multiple of the interval length T). That is, we can expand 
x(t) as (following (4.1.1), (4.1.2)) 



oo oo 

E E 



£ ( f ) = 7^ 7 ^ (<Pm,n(u),x(u)) ¥m,n(t), (4-1-7) 

/here 



<Pm,n(u) 



\/^T e &™{u-mT)/T u £ y mT _ j, ^ mT + T / 2 ), 

otherwise. 



The 1/vT factor makes the basis functions of unit norm. The expansion x(t) 
is equal to x(t) almost everywhere (except at t = (m + 1/2)T) and thus, the L2 
norm of the difference x(t) —x(t) is equal to zero. We call this transform a piecewise 
Fourier series. 

Consider what has been achieved. The expansion in (4.1.7) is valid for arbitrary 
functions. Then, instead of an integral expansion as in the Fourier transform, we 
have a double-sum expansion, and the set of basis functions is orthonormal and 
complete. Time locality is now achieved and there is some frequency localization 
(not very good, however, because the basis functions are rectangular windowed 
sinusoids and therefore discontinuous; their Fourier transforms decay only as 1/cj). 
In terms of time- frequency resolution, we have the rectangular tiling of the time- 
frequency plane that is typical of the short-time Fourier transform (as was shown 
in Figure 2.12(b)). 

However, there is a price to be paid. The size of the interval T (that is, the 
location of the boundaries) is arbitrary and leads to problems. The reconstruction 
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x(t) has singular points even if x{t) is continuous and the transform of x(t) can have 
infinitely many "high frequency" components even if x(t) is a simple sinusoid (for 
example, if its period T s is such that T s /T is irrational). Therefore, the expansion 
will converge slowly to the function. In other words, if one wants to approximate 
the signal with a truncated series, the quality of the approximation will depend on 
the choice of T. In particular, the convergence at points of discontinuity (created 
by periodization) is poor due to the Gibbs phenomenon [218]. Finally, a shift of 
the signal can lead to completely different transform coefficients and the transform 
is thus time- variant. 

In short, we have gained the flexibility of a double-indexed transform indicating 
time and frequency, but we have lost time invariance and convergence is sometimes 
poor. Note that some of these problems are inherent to local Fourier bases and can 
be solved with local cosine bases discussed in Section 4.8. 

4.1.3 Haar Expansion 

We explore the Haar expansion because it is the simplest example of a wavelet 
expansion, yet it contains all the ingredients of such constructions. It also addresses 
some of the problems we mentioned for the local Fourier series. The arbitrariness 
of a single window of fixed length T, as discussed, is avoided by having a variable 
size window. Time invariance is not obtained (actually, requiring locality in time 
implies time variance). The Haar wavelet, or prototype basis function, has finite 
support in time and 1/u decay in frequency. Note that it has its dual in the so- 
called sine wavelet (discussed in Section 4.2) which has finite support in frequency 
and 1/t decay in time. We will see that the Haar and sine wavelets are two extreme 
examples and that all the other examples of interest will have a behavior that lies 
in between. 

The Haar wavelet is defined as 




iP(t) = { -1 ± < t < 1, (4.1.S 



and the whole set of basis functions is obtained by dilation and translation as 

^m,n(*) = 2" m/ V(2" m t-n), m,neZ. (4.1.9) 

We call m the scale factor, since ip m ,n{t) is of length 2 m , while n is called the shift 
factor, and the shift is scale dependent (V>m,n(<0 is shifted by 2 m n). The normal- 
ization factor 2 _m ' 2 makes if) m ,n(t) of unit norm. The Haar wavelet is shown in 
Figure 4.1(c) (part (a) shows the scaling function which will be introduced shortly). 
A few of the basis functions are shown in Figure 4.2(a). It is easy to see that the set 
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(a) 



(b) 




(c) 



(d) 



Figure 4.1 The Haar scaling function and wavelet, given in Table 4.1. (a) The 
scaling function (p(t). (b) Fourier transform magnitude |$(w)|. (c) Wavelet 
4>{t). (d) Fourier transform magnitude |^(w)|. 



is orthonormal. At a given scale, tpm t n(t) and ip m ,n'{t) have no common support. 
Across scales, even if there is common support, the larger basis function is constant 
over the support of the shorter one. Therefore, the inner product amounts to the 
average of the shorter one which is zero (see Figure 4.2(b)). Therefore, 



(il>m,n(t),ip m ',n'(t)) = 5[m - m'\ 5[n 



n 



The advantage of these basis functions is that they are well localized in time (the 
support is finite). Actually, as m — > — oo, they are arbitrarily sharp in time, since 
the length goes to zero. That is, a discontinuity (for example, a step in a function) 
will be localized with arbitrary precision. However, the frequency localization is not 
very good since the Fourier transform of (4.1.8) decays only as 1/lu when uj — > oo. 
The basis functions are not smooth, since they are not even continuous. 
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Figure 4.2 The Haar basis, (a) A few of the Haar basis functions, (b) Haar 
wavelets are orthogonal across scales since the inner product is equal to the 
average of the shorter one. 



One of the fundamental characteristics of the wavelet type expansions which we 
will discuss in more detail later is that they are series expansions with a double 
sum. One is for shifts, the other is for scales and there is a trade-off between time 
and frequency resolutions. This resolution is what differentiates this double-sum 
expansion from the one given in (4.1.7). Now, long basis functions (for in large and 
positive) are sharp in frequency (with corresponding loss of time resolution), while 
short basis functions (for negative m with large absolute value) are sharp in time. 
Conceptually, we obtain a tiling of the time-frequency plane as was shown in Figure 
2.12(d), that is, a dyadic tiling rather than the rectangular tiling of the short-time 
Fourier transform shown in Figure 2.12(b). 

In what follows, the proof that the Haar system is a basis for L2(7Z) is given 
using a multiresolution flavor [73]. Thus, it has more than just technical value; 
the intuition gained and concepts introduced will be used again in later wavelet 
constructions. 



Theorem 4.1 



The set of functions {ip m ,n(t)}m,neZ, with ip(t) and ip m ,n(t) as in (4.1.8-4.1.9), 
is an orthonormal basis for L,2(1l). 
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(a) 

-8 -7 -6 -5 -4 -3 




(d) 

6 7 8 -8-7-6 -5 -4 



2-1 12 3 4 5 L- 1 t 



-3-2-1 1 2 3 4 5 6 7 8 t 





(C) 

tRr 5 - 4 



(f) i/ w 


-8-7-6-5-4-3-2-1 






12 3 4 5 6 7 8*; 





Figure 4.3 Haar wavelet decomposition of a piecewise continuous function. Here, 
rriQ = and mi = 3. (a) Original function f^ '. (b) Average function f^ 1 '. (c) 
Difference rfW between (a) and (b). (d) Average function f^ 2 '. (e) Difference 
S 2 \ (f) Average function /( 3 ). 



Proof 



The idea is to consider functions which are constant on intervals [n2~ m °, (n+ l)2 _m °) and 
which have finite support on [— 2 mi ,2 mi ), as shown in Figure 4.3(a). By choosing mo and 
mi large enough, one can approximate any L2(1Z) function arbitrarily well. Call such a 
piecewise constant function f^~ m °'(t). Introduce a unit norm indicator function for the 
interval [n2~ m °,(n + l)2" m °) 



<P-m ,n(t) = 



2~ n2~ m ° <t<(n+ l)2" m ° , 
otherwise. 



(4.1.10) 



This is called the scaling function in the Haar case. Obviously, /' m °'(t) can be written as 
a linear combination of indicator functions from (4.1.10) 



f ( - m0) (t) = J2 ft m0) V-m ,n(t), 



(4.1.11) 



where N = 2 m ° +mi , and fi~ mo) = 2~ mo/2 f ( - m ° } (n ■ 2~ m °). Now comes the key step: 
Examine two intervals [2n ■ 2" m °, (2n + l)2" m °) and [(2n + 1) • 2" m ", (2n + 2)2" m °). The 
function over these two intervals is from (4.1.11) 



/ a ( n m °V-m ,2»(t) + / 2 ( ;ri°V-m ,2n+l(t)- 



(4.1.12) 



However, the same function can be expressed as the average over the two intervals plus the 
difference needed to obtain (4.1.12). The average is given by 



J2n "T J2n + 1 



V2- ip- mo + l,n{t), 



220 CHAPTER 4 

while the difference can be expressed with the Haar wavelet as 

f(-n>o) _ A— mo) 

J2n J2n+1 /7T i l4 \ 
V2 • V-m + l,n(t). 

Note that here we have used the wavelet and the scaling function of twice the length. Their 
support is from [n ■ 2- m ° + 1 ,(n + l)2 _mo + 1 ) = [2n • 2" m °, (2n + 2)2" m °). Also note that 
the factor ^2 is due to ip- mo+1 , n (t) and <p_ mo +i,„(£) having height 2 (mo_1)/2 = 2 m ° /2 /v / 2, 
instead of 2 m °' 2 with which we started. Calling now 

A-m + l) _ 1 i-ft-mp) | A~ m o)\ 

in — pr\J2n ' J 2n+l h 



and 



,(-m + l) _ _J_//-(-m ) _ r(-m )\ 
u n — py\J2n J2n + 1 )i 



we can rewrite (4.1.12) as 

/(- m ° +1 V- mo+ i,n(i) + 4r m » +1 >v- mo+ i,n(i). 

Applying the above to the pairs of intervals of the whole function, we finally obtain 
/ (_mo) (t) = f ( - mo + 1) {t)+d ( - mo + 1) (t) 

JV _ 1 N__ j 

= E /^ m ° +1 V-mo + l,«(*) + £ di- m " + 1) ^- mo + l,„(i). 

»=" f —f 

This decomposition in local "average" and "difference" is shown in Figures 4.3(b) and (c) 
respectively. In order to obtain j( _mo+2 '(f) phis some linear combination of ip- ma +2,n(i), 
one can iterate the averaging process on the function /( _m ° +1 '(t) exactly as above (see 
Figures 4.3(d), (e)). Repeating the process until the average is over intervals of length 2 mi 
leads to 

mi 2 m l _m -l 

f (-m ) (t) = / (-l) (t)+ £ £ d^Vm,»(t)- (4-1.13) 

m=-m + l n= _ 2 »"i-™ 

The function / <mi) (i) is equal to the average of / <_mo) (i) over the intervals [-2 mi ,0) and 
[0, 2 mi ), respectively (see Figure 4.3(f)). Consider the right half, which equals /q from 
to 2 mi . It has L2 norm equal to |/q |2 mi ' 2 . This function can further be decom- 
posed as the average over the interval [0, 2 mi + 1 ) plus a Haar function. The new average 
function has norm (\f^ mi) \2 mi/2 /V2 = l/^ 7 " |2 (t " 1_1)/2 (since there is no contribution from 
[2 mi , 2 mi + 1 )). Iterating this M times shows that the norm of the average function decreases 
as (|/,5 mi) |2 mi/2 )/2 M/2 = |j o ( m i)| 2 ( m i- M )/ 2 . The same argument holds for the left side as 
well and therefore, f^~ mo> (t) can be approximated from (4.1.13), as 

m 1 + M 2 m l -m -l 
f (-m ) {t) = J- J2 4r^m,n(t)+SM, 

m=-m + l n= — 2 m l~ m 

where ||£m|| = (|/_i | + |/o I) • 2 ( - mi_ '' 2 . The approximation error cm can thus be 
made arbitrarily small since |/„ |, n = — 1, 0, are bounded and M can be made arbitrarily 
large. This, together with the fact that mo and mi can be arbitrarily large completes the 
proof that any L2(1Z) function can be represented as a linear combination of Haar wavelets. 
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The key in the above proof was the decomposition into a coarse approximation 
(the average) and a detail (the difference). Since the norm of the coarse version 
goes to zero as the scale goes to infinity, any L2(7Z) function can be represented 
as a succession of multiresolution details. This is the crux of the multiresolution 
analysis presented in Section 4.2 and will prove to be a general framework, of which 
the Haar case is a simple but enlightening example. 

Let us point out a few features of the Haar case above. First, we can define 
spaces V m of piecewise constant functions over intervals of length 2 m . Obviously, 
V m is included in V m -i, and an orthogonal basis for V m is given by (p m and its shifts 
by multiples of 2 m . Now, call W m the orthogonal complement of V m in V m -\. An 
orthogonal basis for W m is given by if) m and its shifts by multiples of 2 m . The proof 
above relied on decomposing V- mo into V- mo +i and W_ mo +i, and then iterating 
the decomposition again on F_ mo+ i and so on. It is important to note that once 
we had a signal in V- mo , the rest of the decomposition involved only discrete-time 
computations (average and difference operations on previous coefficients). This is 
a fundamental and attractive feature of wavelet series expansions which holds in 
general, as we shall see. 

4.1.4 Discussion 

As previously mentioned, the Haar case (seen above) and the sine case (in Section 
4.2.3) are two extreme cases, and the purpose of this chapter is to construct "in- 
termediate" solutions with additional desirable properties. For example, Figure 4.4 
shows a wavelet constructed first by Daubechies [71] which has finite (compact) 
support (its length is L = 3, that is, less local than the Haar wavelet which has 
length 1) but is continuous and has better frequency resolution than the Haar wave- 
let. While not achieving a frequency resolution comparable to the sine wavelet, its 
time resolution is much improved since it has finite length. This is only one of many 
possible wavelet constructions, some of which will be shown in more detail later. 

We have shown that it is possible to construct series expansions of general 
functions. The resulting tiling of the time-frequency plane is different from that 
of a local Fourier series. It has the property that high frequencies are analyzed 
with short basis functions, while low frequencies correspond to long basis functions. 
While this trade-off is intuitive for many "natural" functions or signals, it is not the 
only one; therefore, alternative tilings will also be explored. One elegant property 
of wavelet type bases is the self-similarity of the basis functions, which are all 
obtained from a single prototype "mother" wavelet using scaling and translation. 
This is unlike local Fourier analysis, where modulation is used instead of scaling. 
The basis functions and the associated tiling for the local Fourier analysis (short- 
time Fourier transform) were seen in Figures 2.12 (a) and (b). Compare these to the 
wavelet-type tiling and the corresponding basis functions given in Figures 2.12(c) 
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(a) 



(b) 





(c) 



(d) 



Figure 4.4 Scaling function and wavelet obtained from iterating Daubechies' 
4-tap filter, (a) Scaling function ip(t). (b) Fourier transform magnitude |<£>(aj)|. 
(c) Wavelet tp(t). (d) Fourier transform magnitude |\l/(a;)|. 



and (d) where scaling has replaced modulation. One can see that a dyadic tiling 
has been obtained. 



4.2 MULTIRESOLUTION CONCEPT AND ANALYSIS 

In this section, we analyze signal decompositions which rely on successive approxi- 
mation (the Haar case is a particular example). A given signal will be represented 
by a coarse approximation plus added details. We show that the coarse and de- 
tail subspaces are orthogonal to each other. In other words, the detail signal is 
the difference between the fine and the coarse version of the signal. By applying 
the successive approximation recursively, we will see that the space of input signals 
1^2(71) can be spanned by spaces of successive details at all resolutions. This follows 
because, as the detail resolution goes to infinity, the approximation error goes to 
zero. 



4.2. MULTIRESOLUTION CONCEPT AND ANALYSIS 223 

Note that this multiresolution approach, pioneered by Mallat [180] and Meyer 
[194], is not only a set of tools for deriving wavelet bases, but also a mathematical 
framework which is very useful in conceptualizing problems linked to wavelet and 
subband decompositions of signals. We will also see that multiresolution analysis 
leads to particular orthonormal bases, with basis functions being self-similar at 
different scales. We will also show that a multiresolution analysis leads to the two- 
scale equation property and that some special discrete-time sequences play a special 
role in that they are equivalent to the filters in an orthogonal filter bank. 

4.2.1 Axiomatic Definition of Multiresolution Analysis 

Let us formally define multiresolution analysis. We will adhere to the choice of 
axioms as well as the ordering of spaces adopted by Daubechies in [73]. 

Definition 4.2 

A multiresolution analysis consists of a sequence of embedded closed subspaces 

. . . V 2 C Vy C V C V-i C V-2 ■ ■ ■ (4.2.1) 

such that 

(a) Upward Completeness 

(b) Downward Completeness 

(c) Scale Invariance 

(d) Shift Invariance 



\JV m = L 2 (TZ). (4.2.2) 

mg2 



f]V m = {0}. (4.2.3) 

mdZ 



f(t) £l/ m ^ f(2 m t) G V . (4.2.4) 



f(t) G Vb => f(t - n) G Vo, for all n G Z. (4.2.5) 

(e) Existence of a Basis There exists (p G Vo, such that 

{ip(t - n) | n G Z} (4.2.6) 

is an orthonormal basis for Vq. 
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Remarks 

(a) If we denote by Projy m [/(£)], the orthogonal projection of f(t) onto V m , then 
(4.2.2) states that linw— oo Projv m [/(*)] = /(*)• 

(b) The multiresolution notion comes into play only with (4.2.4), since all the 
spaces are just scaled versions of the central space Vq [73] . 

(c) As seen earlier for the Haar case, the function <p(t) in (4.2.6) is called the 
scaling function. 

(d) Using the Poisson formula, the orthonormality of the family {ip(t — n)} n ^z 
as given in (4.2.6) is equivalent to the following in the Fourier domain (see 
(2.4.31)): 



oo 



|$(w + 2/t7r)| 2 = 1. (4.2.7) 



(e) Using (4.2.4-4.2.6), one obtains that {2 m / 2 ip{2 m t - n) \ n G 2} is a basis for 
V- m - 

(f) The orthogonality of ip(t) is not necessary, since a nonorthogonal basis (with 
the shift property) can always be orthogonalized [180] (see also Section 4.3.2). 

As an example, define V m as the space of functions which are piecewise constant 
over intervals of length 2 m and define ip(t) as the indicator function of the unit 
interval. Then, it is easy to verify that the Haar example in the previous section 
satisfies the axioms of multiresolution analysis (see Example 4.1 below). 

Because of the embedding of spaces (4.2.1) and the scaling property (4.2.4), we 
can verify that the scaling function (p(t) satisfies a two-scale equation. Since Vq is 
included in V-±, ip(t), which belongs to Vo, belongs to V-i as well. As such, it can 
be written as a linear combination of basis functions from V-\. However, we know 
that {y2ip{2t — n) | n € 2} is an orthonormal basis for U_i; thus, tp(t) can be 
expressed as 

oo 

tp(t) = V2 Y^ 9o[n]<p(2t-n). (4.2.8) 

'« = — oo 

Note that with the above normalization, HffoMH = 1 an d 9o[ n ] = v2- 
(<p(2t — n),<p(t)) (see Problem 4.2). Taking the Fourier transform of both sides, 
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we obtain 

y{t)e- jujt dt = V2 / J2 9o[n] ip(2t - n)e- ju,t dt 

n=—oo 

= V2 Y, 9o[n] -I p(t)e-^ t/2 e-^ 2 dt 

n=—oo 
oo 

= ^ E »m e "' (w/2)n / v(*)e- J(w/2)t ^ 



i=G ( e ^ 2 ) *( W /2), 



(4.2.9) 



/here 



G (e*") = £«,[ 

It will be shown that this function characterizes a multiresolution analysis. It is 
obviously 27r-periodic and can be viewed as a discrete-time Fourier transform of a 
discrete-time filter <7oM- This last observation links discrete and continuous time, 
and allows one to construct continuous-time wavelet bases starting from discrete 
iterated filters. It also allows one to compute continuous-time wavelet expansions 
using discrete-time algorithms. 

An important property of Go(e :,u; ) is the following: 

|Go(e-H| 2 + |Go(e j(w+ff) )| 2 = 2. (4.2.10) 

Note that (4.2.10) was already given in (3.2.54) (again a hint that there is a strong 
connection between discrete and continuous time). Equation (4.2.10) can be proven 
by using (4.2.7) for 2ui: 

00 

Y \$(2u + 2kn)\ 2 = 1. (4.2.11) 

fc = — oo 

Substituting (4.2.9) into (4.2.11) 

1 = i^|G (e i(cJ+/OT) )| 2 |$(w + fc7r)| 2 

k 

= \j2\G (e j{iV+2k7r) )\ 2 \$(u J + 2kTr)\ 2 
k 

+ I Y, |G (e^ + ( 2fc+1 W)| 2 |<lKu; + (2k + 1 



, . -,7T)I 2 

2 

k 
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= \\G Q (e? u )\ 2 E 1^ + 2fc7r )! 2 + \\Go{e^ + ^)\ 2 £ |*(w + (2fc + l)vr)| 2 

k k 

= \(\Go(en\ 2 + \Go(e^ + ^)\ 2 ), 

which completes the proof of (4.2.10). With a few restrictions on the Fourier trans- 
form $(u>) (bounded, continuous in to = 0, and <£(0) / 0), it can be shown that 
Go(e- ,u; ) satisfies 

|G (1)| = V2 
G (-1)=0 

(see Problem 4.3). Note that the above restrictions on 5>(u>) are always satisfied in 
practice. 

4.2.2 Construction of the Wavelet 

We have shown that a multiresolution analysis is characterized by a 27r-periodic 
function Go(e :,tJ ) with some additional properties. The axioms (4.2.1-4.2.6) guar- 
antee the existence of bases for approximation spaces V m . The importance of mul- 
tiresolution analysis is highlighted by the following theorem. We outline the proof 
and show how it leads to the construction of wavelets. 

Theorem 4.3 

Whenever the sequence of spaces satisfy (4.2.1-4.2.6), there exists an or- 
thonormal basis for L2(JZ): 

i>m,n{t) = 2- m ' 2 i,{2- m t-n), m,neZ, 

such that {ip mn }i ti £ Z is an orthonormal basis for W m , where W m is the 
orthogonal complement of V m in V m -\. 



Proof 



To prove the theorem, let us first establish a couple of important facts. First, we defined 
W m as the orthogonal complement of V m in V m -i- In other words 

V m -1 = V m ® W m - 

By repeating the process and using (4.2.2) we obtain that 

£j(K) = W m . (4.2.12) 

m€Z 

Also, due to the scaling property of the V m spaces (4.2.4), there exists a scaling property 
for the W m spaces as well: 

f(t) e W m ^ f(2 m t) e W . (4.2.13) 
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Our aim here is to explicitly construct 2 a wavelet tp(t) £ Wo, such that ip(t — n),n € Z is 
an orthonormal basis for Wo- If we have such a wavelet ip(t), then by the scaling property 
(4.2.13), ip m ,n{t),n £ Z will be an orthonormal basis for W m . On the other hand, (4.2.12) 
together with upward/downward completeness properties (4.2.2-4.2.3), imply that {ip m ,n}, 
m,n £ Z is an orthonormal basis for L2(1Z), proving the theorem. Thus, we start by 
constructing the wavelet ip(t), such that ip 6 Wo C V-i. Since ip 6 V-i 

if>{t) = v / 2^»i[n]v9(2i-n). (4.2.14) 

Taking the Fourier transform one obtains 

*(«) = ^(e*" 3 ) •*(!), (4-2.15) 

where Gi(e 3u ') is a 27r-periodic function from L2([0,2n]). The fact that ip(t) belongs to Wo, 
which is orthogonal to Vo, implies that 

(tp(t- k),ip{t)) = 0, for all k. 

This can also be expressed as (in the Fourier domain) 

*(w) $*(w) e' 1 "* = 0, 
or equivalently, 

/•2ir 

/ e^ fe do;^<I'(a; + 27r/) $*(w + 27r/) = 0. 

io j 

This further implies that 

^2^(uj + 2Trl)^*{uj + 2Tvl) = 0. (4.2.16) 

i 

Now substitute (4.2.9) and (4.2.15) into (4.2.16) and split the sum over I into two sums over 
even and odd l's 



i^Gi(e j(c " /2+2i7r) ) <S>(lo/2 + 21tv) G*o(e 3 ^ /2+2llT) ) $*(w/2 + 2frr) 

+ | E Gi(e* (w/a+(2 ' +1)ir) ) ®( w /2 + (2J + 1)tt) G^g^/s+^+i)*)) r(w/2 + (2/ + 1)w) 



2 
i 



0. 



However, since Go and Gi are both 27r-periodic, substituting Q, for lu/2 gives 

Gi(e^) GS(e jn )^|$(fi + 2/7r)| 2 + Gi(e j(n+7r) ) G' {e 3(il+1 ' ) ) ^ |*(fi + (2/ + 1)tt)| 2 = 0. 



! 



"Note that the wavelet we construct is not unique. 
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Using now (4.2.7), the sums involving <&(w) become equal to 1, and thus 

Gi(e J ") Go(e 3n )+Gi(e 3(n+,r) ) Go(e 3(n+,r) ) = 0. (4.2.17) 

Note how (4.2.17) is the same as (3.2.48) in Chapter 3 (on the unit circle). Again, this dis- 
plays the connection between discrete and continuous time. Since Gg(e^) and Go(e U ' +7V ') 
cannot go to zero at the same time (see (4.2.10)), it means that 

Gi(e*") = A(e*") G* (e j{u+V) ), 
where A(e 3 ") is 27r-periodic and 

A(e JC ") + A(e j( '" +7r) ) = 0. 
We can choose A(e-' a ') = — e~ j " to obtain 

Gi(e* w ) = -e-^GS(e 3(w+,r) ), (4.2.18) 

or, in time domain 

gi[n] = (-1)" go[-n+l]. 

Finally, the wavelet is obtained as (see (4.2.15)) 

*(w) = — \=e~ iu/2 GS(e j(u,/2+7r) ) $(w/2), (4.2.19) 

v2 

V>(*) = V2^(-l) n go[-n + l] ip(2t-n). 

To prove that this wavelet, together with its integer shifts, indeed generates an orthonormal 
basis for Wo, one would have to prove the orthogonality of basis functions tpo,n{t) as well as 
completeness; that is, that any f(t) G Wo can be written as f(t) — ^2 n a n ipo,n- This part 
is omitted here and can be found in [73], pp. 134-135. 

4.2.3 Examples of Multiresolution Analyses 

In this section we will discuss two examples: Haar, which we encountered in Sec- 
tion 4.1, and sine, as a dual of the Haar case. The aim is to indicate the embedded 
spaces in these two example cases, as well as to show how to construct the wavelets 
in these cases. 

Example 4.1 Haar Case 

Let us go back to Section 4.1.3. Call V m the space of functions which are constant over 
intervals [n2 m ,(n + l)2 m ). Using (4.1.10), one has 



f (m) (,V m «• / M = J2 /n m W(i). 
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The process of taking the average over two successive intervals creates a function j( m+1 ) g 
V m +i (since it is a function which is constant over intervals [n2 m+1 , (n + l)2 m+1 )). Also, it 
is clear that 

Vm + l C V m . 

The averaging operation is actually an orthogonal projection of f^ m ' £ V m onto V m +i, since 
the difference d (m+1) = / (m) - /( m+1 ) is orthogonal to F m+ i (the inner product of d (m+1) 
with any function from V m +i is equal to zero). In other words, dS m+ ' belongs to a space 
W m +i which is orthogonal to V m +i- The space W m +i is spanned by translates of tpm+i,n{t) 



d 



(m + l) 



G W m +i 4* d 



(m+l) 



- £ * 



(m+l). 



.(*)■ 



This difference function is again the orthogonal projection of f^ m ' onto W m +i- We have 
seen that any function f^ m ' can be written as an "average" plus a "difference" function 

/(™)(i) = /( m+1 )(i) + d( m+1 )(i). (4.2.20) 

Thus, W m +i is the orthogonal complement of V m +i in V m . Therefore, 

y m = v m+ i e w^ m+ i 

and (4.2.20) can be written as 

f m \t) = Proj Vm+i [/( m '(i)]+Proj Wm+i [/( m )(t)]. 

Repeating the process (decomposing V m +i into V m +2 ffi W m +2 and so on), the following is 
obtained: 

V m = Wm + l 6 Wm + 2 © Wm + 3 © • • • 

Since piecewise constant functions are dense in L2(TZ), as the step size goes to zero (4.2.2) is 
satisfied as well as (4.2.12), and thus the Haar wavelets form a basis for L2CR). Now, let us 
see how we can construct the Haar wavelet using the technique from the previous section. 
As we said before, the basis for Vo is {tp(t — n)} n€ z with 



<p(t) = 



1 0< t < 1, 
otherwise. 



To find G (O, write 
hence 

from which 
Then by using 



tp(t) = tp(2t)+<p(2t-l), 



*(w) = -p 



1 1 + e 



-J"/2 



V2 V2 



*•? 



G (O = ^(1 + e" 



Gi(e*") = -e"^ Go( 



j'(w+«) 



) = 



1 + e 



j(w+») 



1-e" 



v^ 



V2 ' 
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one obtains 

*( W ) = -L Gl (e-/ 2 )<I>(| 

Finally 

V(t) = ¥>(2t) - v(2t - 1), 

or 

r i o<i< |, 

V(t) = ^ "I |<t<l, 

^ otherwise. 

The Haar wavelet and scaling function, as well as their Fourier transforms, were given in 
Figure 4.1. 

Example 4.2 5mc Case 

In order to derive the sine wavelet, 3 we will start with the sequence of embedded spaces. 
Instead of piecewise constant functions, we will consider bandlimited functions. Call Vo the 
space of functions bandlimited to [— n, n] (to be precise, Vo includes cos(-7ri) but not sin(7rf)). 
Thus, V-i is the space of functions bandlimited to [—2tt,2tv]. Then, call Wo the space of 
functions bandlimited to [— 2ty, — n] U [it, 2n] (again, to be precise, Wo includes sin(7rf) but 
not cos(nt)). Therefore 

V_i = Vo®W , 

since Vo is orthogonal to Wo and together they span the same space as V_i (see Figure 4.5). 
Obviously, a projection of a function f^~ ' from V-i onto Vo will be a lowpass approximation 
j' ', while the difference d' ' = J 1 - -1 ' — /'°' will exist in Wo- Repeating the above 
decomposition leads to 

oo 

K-i = 0W m , 

m = 

as shown in Figure 4.5. This is an octave-band decomposition of V-\. It is also called a 
constant-Q filtering, since each band has a constant relative bandwidth. It is clear that an 
orthogonal basis for Vo is given by {sinci(i — n)} (see (4.1.4), or 

. , sin irt 

¥>(*) = —T' 

Tit 

which is thus the scaling function for the sine case and the space Vo of functions bandlimited 
to [— 7r,7r]. Using (4.2.9) one gets that 

r 1 1 sin(7m/2) , , „ „„. 



that is, 



1 otherwise, 



or, Goie 3 ^) is an ideal lowpass filter. Then Gi(e- ,t ") becomes (use (4.2.18)) 
G,{en = 



, u . _ , -y/2e~* u we[-7r,-f]U[f,7r], 



otherwise, 



3 In the mathematical literature, this is often referred to as the Littlewood-Paley wavelet [73]. 
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Figure 4.5 Decomposition of Vo into successive octave bands. Actually, there 
is a scaling factor for Vj and Wj by 2 J ' 2 to make the subspaces of unit norm. 



which is an ideal highpass filter with a phase shift. The sequence gi[n] is then 

gi [n] = (-l) n So [-n + l], (4.2.22) 

whereupon 



V(t) = \/2 ^(-l)-" +1 5 oN v=(2i + n-l). 



Alternatively, we can construct the wavelet directly by taking the inverse Fourier transform 
of the indicator function of the intervals [— 2n, —tv] U [it, 2tt\: 



m = 



i 



e 3 " du> + 



2^ 



e jut duj = 2 



sin(27ri) sin(7r£) sin(7ri/2) 



cos(37ri/2). 



2n J_ 27t " " ' 2tt J n ' '" " 2nt wt nt/2 

(4.2.23) 
This function is orthogonal to its translates by integers, or {tj)(t),ij}(t — n)) — 5[n], as 
can be verified using Parseval's formula (2.4.11). To be coherent with our definition of Wo 
(which excludes cos(-7rf)), we need to shift ip(t) by 1/2, and thus {tp(t — n — 1/2)}, n £ Z, 
is an orthogonal basis for Wo- The wavelet basis is now given by 



tpv 



.(t) = {^ 



l/2 



tf(2" 



1/2)} 



m,n € 2, 



where ipm,n(t),n £ 2, is a basis for functions supported on 



[-2- 



V] U [2" 



Since m can be arbitrarily large (positive or negative), it is clear that we have a basis for 
L2CR-) functions. The wavelet, scaling function, and their Fourier transforms are shown in 
Figure 4.6. The slow decay of the time-domain function (1/i as t — » 00) can be seen in the 
figure, while the frequency resolution is obviously ideal. 

To conclude this section, we summarize the expressions for the scaling function and 
the wavelet as well as their Fourier transforms in Haar and sine cases in Table 4.1. 
The underlying discrete-time filters were given in Table 3.1. 
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(a) 



(b) 




(c) 



(d) 



Figure 4.6 Scaling function and the wavelet in the sine case, (a) Scaling 
function (p(t). (b) Fourier transform magnitude |$(w)|. (c) Wavelet ip(t). (d) 
Fourier transform magnitude |\l/(u;)|. 



4.3 Construction of Wavelets Using Fourier Techniques 



What we have seen until now, is the conceptual framework for building orthonormal 
bases with the specific structure of multiresolution analysis, as well as two particular 
cases of such bases: Haar and sine. We will now concentrate on ways of building 
such bases in the Fourier domain. Two constructions are indicated, both of which 
rely on the multiresolution framework derived in the previous section. First, Meyer's 
wavelet is derived, showing step by step how it verifies the multiresolution axioms. 
Then, wavelets for spline spaces are constructed. In this case, one starts with 
the well-known spaces of piecewise polynomials and shows how to construct an 
orthonormal wavelet basis. 
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Table 4.1 Scaling functions, wavelets and their Fourier 
transforms in the Haar and sine cases. The underlying 
discrete-time filters are given in Table 3.1. 





Haar 


Sine 


¥>(*) 


( 1 0< t < 1, 
|_ otherwise. 


sin 7ri 

7Tt 


m 


[ 1 0< t < ±, 
< -1 5 <i< 1, 

[ otherwise. 


""ig/a-vff" cos(37r(t/2 - 1/4)) 


$(w) 


„-?W2 sinw/2 
e w/2 


1 1 M<7r, 
|_ otherwise. 


*(w) 


_ v -iW2(sinw/4) 2 


f _ e -jW2 tt < | w | < 27T, 

]_ otherwise. 




]4-i 




k D(D) _. 


4-i 


(\ 


72 1 \ 




■^ — ^ — 


— i 


2 
i 


1 *" 



_!£/ DD_?ff 

3 3 



(b) 



2Z7 D Iff 
3 3 



Figure 4.7 Construction of Meyer's wavelet, (a) General form of the function 
9{x). (b) |5>(w)| in Meyer's construction. 



4.3.1 Meyer's Wavelet 

The idea behind Meyer's wavelet is to soften the ideal — sine case. Recall that 
the sine scaling function and the wavelet are as given in Figure 4.6. The idea of 
the proof is to construct a scaling function ip(t) that satisfies the orthogonality and 
scaling requirements of the multiresolution analysis and then construct the wavelet 
using the standard method. In order to soften the sine scaling function, we find a 
smooth function (in frequency) that satisfies (4.2.7). 

We are going to show the construction step by step, leading first to the scaling 
function and then to the associated wavelet. 



(a) Start with a nonnegative function 6{x) that is differentiable (maybe several 
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^-0(0 +2D) |^-D(D) ^-D(DD2D) 



JiSfpuJJ Q2D - 4 Jna-^o 



W Q iU 2D W3QM 



Figure 4.8 Pictorial proof that {(f(t — n)} n ^z form an orthonormal family in L 2 (1Z). 



times) and such that (see Figure 4.7(a)) 

6(x) = 



x< 0, 

1 1< x. 



(4.3.1) 



and satisfying 6(x) + 6(1 — x) = 1 for < x < 1. There exist various choices 
for 6(x), one of them being 



x < 0, 
(./•) = { 3x 2 -2x 3 < x < 1, 

1 1 < x. 



(b) Construct the scaling function $(w) such that (see Figure 4.7(b)) 

$(w) = 



(4.3.2) 



^(2 + f) w<0, 

To show that $(cj) is indeed a scaling function with a corresponding multires- 
olution analysis, one has to show that (4.2.1-4.2.6) hold. As a preliminary 
step, let us first demonstrate the following: 

(c) {(f(t — n)} n £z is an orthonormal family from 1,2(71). To that end, we use the 
Poisson formula and instead show that (see (4.2.7)) 

Y^ |$(^ + 2fc7r)| 2 = 1. (4.3.3) 

k&Z 
From Figure 4.8 it is clear that for to £ [— (27r/3) — 2mr, (2tt)/3 — 2mx\ 

^|$(w + 2fc7r)| 2 = \^(uj + 2mv)\ 2 = 1. 
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The only thing left is to show (4.3.3) holds in overlapping regions. Thus, take 
for example, uj £ [(27r)/3, (47r)/3]: 



§{oj) 2 + §{u) -2tt) 



2vr / V 2tt 



«'*-£) + '(- 1 + £) 



.(«-gi + m 

1. 



2-^ 

2ti-/ 



The last equation follows from the definition of 9 (see (4.3.2)). 

(d) Define as V$ the subspace of L2(TZ) generated by (p(t — n) and define as V^'s 
those satisfying (4.2.4). 

Now we are ready to show that the V^s form a multiresolution analysis. Until 
now, by definition, we have taken care of (4.2.4-4.2.6), those left to be shown 
are (4.2.1-4.2.3). 



D(D) 




3 3 



3D 



4D 



G (e") j2U(2n) 720(20 00 72D(2DD8D) 




D(2D) 0(D) G (<J a )/J2 



Figure 4.9 Pictorial proof of (4.2.9). 



(e) Prove (4.2.1): It is enough to show that V\ C Vo, or ip (t/2) = ^ n c n (p(t — n). 
This is equivalent to saying that there exists a periodic function Go{e 3U) ) £ 
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L 2 ([0,2tt]) such that $(2w) = (l/v / 2)Go(e^)$(cj) (see 4.2.9). Then choose 

G (e ju) ) = V2^$(2w + 4fcvr). (4.3.4) 

fce.z 

A pictorial proof is given in Figure 4.9. 
(f) Show (4.2.2): In this case, it is enough to show that if 

(/^m,n) = 0, m,n£ Z => { = /, 

then 

(/, <Pm,n) = «=» Y, ^( 2m ( w + 2for)) $*(w + 2fc7T) = 0. 

Take for example, w £ [— (2tt)/3, (27r)/3]. Then for any k 



F(2 m (co + 2kTi)) $(w + 2A;7r) = 0, 



and for fc = 
For any m 
and thus 
or 



F(2 m cj) $(w) = 0. 



F(2 m w) = 0, wG[-^,^] 



F(w) = 0, w g n, 

/ = o. 



(g) Show (4.2.3): If / € C] m&z V m then F € fUeir ^"J where F{V m } is the 
Fourier transform of V m with the basis 2 m / 2 e~i ku)2 ~ m <fr(2~ m <jj). Since $(2~ m w) 
has its support in the interval 



it follows that I — > {0} as m — > — oo. 
In other words, 



47T m 


47T „ 






3 


3 J 



or 



F{u) e f| F{y m } = 0, 
/(«) = o. 
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ID (D/2)| 




030 JO D2D JO DO JO 



10 D 4 -0 20 10 3D 



0(0+20) 




D3Q jo 020 JO do JO 




D3D JO D2D JUmJU f D iU 20 ^ 3D 



Figure 4.10 Pictorial construction of Meyer's wavelet. 



(h) Finally, one just has to find the corresponding wavelet using (4.2.19): \l/(w) 
-(1/72) e-^/ 2 G* {e^/ 2+ ^) $(w/2). 



Thus using (4.3.4) one gets 

1 



*(w) 



V Jw/2 ^$( W + (4HlW$g 



Hence ^(w) is defined as follows (see Figure 4.10): 



*(w) 







< w < 3p 



^e-J' w / 2 *(w - 2tt) 2f < w < 4f , 

v^ ^V2^ 3 — w — 3 ' 







8tt 



< U, 



(4.3.5) 



and ^(uj) is an even function of uj (except for a phase factor e i w ' 2 ). Note 
that (see Problem 4.4) 

" ' " 1. (4.3.6) 



fce.z 



An example of Meyer's scaling function and wavelet is shown in Figure 4.11. A 
few remarks can be made on Meyer's wavelet. The time-domain function, while of 
infinite support, can have very fast decay. The discrete-time filter Go(e :,a; ) which is 
involved in the two-scale equation, corresponds (by inverse Fourier transform) to a 
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(a) 



,-equency (radians] 

(b) 





(c) 



(d) 



Figure 4.11 Meyer's scaling function and the wavelet, (a) Scaling function 
f(t). (b) Fourier transform magnitude |<J>(oj)|. (c) Wavelet ip(t). (d) Fourier 
transform magnitude |\l/(u;)|. 



sequence go[n] which has similarly fast decay. However, Go(e :,tJ ) is not a rational 
function of e- 7 " and thus, the filter go[n] cannot be efficiently implemented. Thus, 
Meyer's wavelet is more of theoretical interest. 



4.3.2 Wavelet Bases for Piecewise Polynomial Spaces 

Spline Or Piecewise Polynomial Spaces Spaces which are both interesting and 
easy to characterize are the spaces of piecewise polynomial functions. To be more 
precise, they are polynomials of degree I over fixed length intervals and at the knots 
(the boundary between intervals) they have continuous derivatives up to order l — l. 
Two characteristics of such spaces make them well suited for the development of 
wavelet bases. First, there is a ladder of spaces as required for a multiresolution 
construction of wavelets. Functions which are piecewise polynomial of degree / over 
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intervals [k2 l , {k + 1)2*) are obviously also piecewise polynomial over subintervals 
[k2 J , (k + 1)2 J ], j < i. Second, there exist simple bases for such spaces, namely the 
i?-splines. Call: 

{functions which are piecewise polynomial of degree I 
over intervals [k2 t , (k + 1)2*) and having I — 1 
continuous derivatives at k2 l , k <E Z 

For example, V\ is the space of all functions which are linear over half-integer 
intervals and continuous at the interval boundaries. Consider first, the spaces with 
unit intervals, that is, Vq . Then, bases for these spaces are given by the i?-splines 
[76, 255] . These are obtained by convolution of box functions (indicator functions 
of the unit interval) with themselves. For example, the hat function, which is a 
box function convolved with itself, is a (nonorthogonal) basis for piecewise linear 
functions over unit intervals, that is Vq . 

The idea of the wavelet construction is to start with these nonorthogonal bases 
for the Vq s and apply a suitable orthogonalization procedure in order to get an 
orthogonal scaling function. Then, the wavelet follows from the usual construction. 
Below, we follow the approach and notation of Unser and Aldroubi [6, 298, 299, 296]. 
Note that the relation between splines and digital filters has also been exploited in 

[118]- 

Call I(t) the indicator function of the interval [—1/2, 1/2] and P > '(t) the fc-time 
convolution of lit) with itself, that is, P k \t) = I(t) * P k ~^(t), P ^(t) = I(t). 
Denote by /?' ' (t) the 5-spline of order ./V where 

(a) for N odd: 

p( N )(t) = P N \t), (4.3.7) 

sin(o;/2)\ 

D* '(LU) = i 

(b) and for N even: 



N+l 



g( „ V) = ^j _ (w 



f)(i) = I iN) (t-\ 

B W {u) = e _, W2 /sin(^2) 
K ' V w/2 



N+l 



(4.3.9) 
(4.3.10) 



The shift by 1/2 in (4.3.9) is necessary so that the nodes of the spline are at integer 
intervals. The first few examples, namely N = (constant spline), N = 1 (linear 
spline), and N = 2 (quadratic spline) are shown in Figure 4.12. 
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(a) 



(b) 




(c) 



Figure 4.12 £>-splines, for N = 0, 1,2. (a) Constant spline, (b) Linear spline, 
(c) Quadratic spline. 



Orthogonalization Procedure While the 5-spline ft N \t) and its integer trans- 
lates form a basis for Vq , it is not an orthogonal basis (except for N = 0). 
Therefore, we have to apply an orthogonalization procedure. Recall that a function 
f(t) that is orthogonal to its integer translates satisfies (see (4.2.7)) 



(f(t),f(t-n)) 



n£Z 



S[n] ^^ Y^ \F(u + 2kir)\ 2 = 1. 



kez 



Starting with a nonorthogonal ft '(t), we can evaluate the following 27r-periodic 
function: 



B^ 2N+1 \u) = £|SW(u; + 2ibr)| 2 . 



(4.3.11) 



kez 
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In this case 4 B^ +1 \uj) is the discrete-time Fourier transform of the discrete-time 
i?-spline b^ 2N+1 '[n], which is the sampled version of the continuous-time 5-spline 
[299l 

6 (2iv+i)[ n] = p(™+i)(t) . (4.3.12) 

t=n 

Because {(3^ N '(t — n)} is a basis for Vq , one can show that there exist two positive 
constants A and C such that [71] 

0<A<B^ 2N+1 \uj) <C<oo. (4.3.13) 

One possible choice for a scaling function is 

m = ■ """(«> ,4.3.14, 

Because of (4.3.13), 5>(w) is well defined. Obviously 

5>(« + 2for)| a = ^yEl BW (- + 2HP = 1, 
k *• ' k 

and thus, the set {<p(t — n)} is orthogonal. That it is a basis for Vq follows 
from the fact that (from (4.3.14)) /3^ N '(t) can be written as a linear combination of 
ip(t — n) and therefore, since any f(t) £ Vq can be written in terms of /?' '(t — n), 
it can be expressed in terms of <p(t — n) as well. 

Now, both /?' > (t) and <p(t) satisfy a two-scale equation because they belong to 
Vq and thus V_ x ; therefore, they can be expressed in terms of j3^ N '{2t — n) and 
(f(2t — n), respectively. In Fourier domain we have 

b (s) M = M (f ) ^ (7V) (f ) > ( 4 - 3 - 15 ) 

*M = ^ G o(^ /2 ) * (|) . (4-3.16) 

where we used (4.2.9) for 5>(u;). Using (4.3.14) and (4.3.15), we find that 



4 Note that /3 (JV) (i) has a Fourier transform B (N) {u). On the other hand, b (2N+1) [n] has a 
discrete-time Fourier transform S' 2JV+1 '(o;). 23^' (w) and B' 2JV+1 '(u>) should not be confused. 
Also, B (2iv+1) (w) is a function of e ju> . 
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that is, 

G (e 3UJ ) = V2 v / v v ; . (4.3.18 

Then, following (4.2.19), we have the following expression for the wavelet: 

*(w) = — X -= e- ju;/2 G* U^/ 2+w A $ (-) . (4.3.19) 

Note that the orthogonalization method just described is quite general and can be 
applied whenever we have a multiresolution analysis with nested spaces and a basis 
for Vo. In particular, it indicates that in Definition 4.2, ip(t) in (4.2.6) need not be 
from an orthogonal basis since it can be orthogonalized using the above method. 
That is, given g(t) which forms a (nonorthogonal) basis for Vo and satisfies a two- 
scale equation, compute a 27r-periodic function D(lu) 

D{u) = Y^ \G{u + 2fc7r)| 2 , (4.3.20) 

k&Z 

where G{u) is the Fourier transform of git). Then 

G(w) 



$(w) 



VW) 



corresponds to an orthogonal scaling function for V$ and the rest of the procedure 
follows as above. 

Orthonormal Wavelets for Spline Spaces We will apply the method just de- 
scribed to construct wavelets for spaces of piecewise polynomial functions intro- 
duced at the beginning of this section. This construction was done by Battle [21, 22] 
and Lemarie [175], and the resulting wavelets are often called Battle-Lemarie wave- 
lets. Earlier work by Stromberg [283, 284] also derived orthogonal wavelets for 
piecewise polynomial spaces. We will start with a simple example of the linear 
spline, given by 



(4.3.21) 



otherwise. 
It satisfies the following two-scale equation: 

/3 (1) (t) = ^/9 (1) (2t + l) + /3 (1) (2t) + ^/3 (1) (2i-l)- 
The Fourier transform, from (4.3.7), is 



B(1)(u) = j-^^-j (4322) 
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In order to find B^ 2 +1 '(u) (see (4.3.11)), we note that its inverse Fourier transform 
is equal to 

= _ / e jnw \B {N \uj)\ 2 (Ilo 
2ir ./_„ 



pW(t) P {N) (t - n) dt, (4.3.23) 

by Parseval's formula (2.4.11). In the linear spline case, we find 6^ 3 '[0] = 2/3 and 
6( 3 ) [1] = 6( 3 ) [-1] = 1/6, or 

which is the discrete-time cubic spline [299]. From (4.3.14) and (4.3.22), one gets 

sin 2 (cj/2) 



$(w) 



(w/2) 2 (l-(2/3)sin 2 (w/2)) 1 /2 



which is an orthonormal scaling function for the linear spline space Vq . 
Observation of the inverse Fourier transform of the 2-7r-periodic function 
(1 — (2/3) sin 2 (a;/2)) 1 ' 2 , which corresponds to a sequence {a n }, indicates that tp(t) 
can be written as a linear combination of {f3^(t — n)}: 

<p(t) = 5> n /3«(i-n). 

n&Z 

This function is thus piecewise linear as can be verified in Figure 4.13(a). Taking 
the Fourier transform of the two-scale equation (4.3.21) leads to 

„<■)(„, = Q e -« + l + IeW 2 ) 8 a»(|) = |( 1 + OT Q )e u> (|), 

and following the definition of M{u) in (4.3.15), we get 

M(u) = ^(1 + cosH) = cos 2 (|). 

Therefore, G f o(e- ,u; ) is equal to (following (4.3.18)), 

cos 2 (w/2)(l - (2/3)sin 2 (w/2)) 1/2 



G (eP u ) = V2 



(1 - (2/3) sin 2 (w)) 1/2 
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(a) 









(c) 



(d) 



Figure 4.13 Linear spline basis, (a) Scaling function (p(t). (b) Fourier trans- 
form magnitude |$(u;)|. (c) Wavelet tp(t). (d) Fourier transform magnitude 

l*H|. 



and the wavelet follows from (4.3.19) as 



or 



*(w) 



-jw/2 



(l-(2/3)sm 2 (co/2)y 
sin 4 (u;/4) / 1 - (2/3) cos 2 (w/4) 



(w/4)2 I (1 - (2/3) sin 2 (f ))(! - (2/3) sin 2 (f )) 



Rewrite the above as 



*(w) 



sin 2 (w/4) 
(^/4) 2 



Q(u) 



1/2 

(4.3.24) 
(4.3.25) 
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where the definition of Q(uj), which is 47r-periodic, follows from (4.3.24). Taking 
the inverse Fourier transform of (4.3.25) leads to 

m = ^g[n]^ 1 )(2t-n), 

with the sequence {(?[«]} being the inverse Fourier transform of Q{u). Therefore, 
ip(t) is piecewise linear over half- integer intervals, as can be seen in Figure 4.13(b). 
In this simple example, the multiresolution approximation is particularly clear. 
As said at the outset, Vq is the space of functions piecewise linear over integer 
intervals, and likewise, V_{ has the same property but over half-integer intervals. 
Therefore, Wq (which is the orthogonal complement to Vq in V_{) contains 

the difference between a function in V_{ and its approximation in Vq . Such a 
difference is obviously piecewise linear over half-integer intervals. 

With the above construction, we have obtained orthonormal bases for Vq and 

Wq as the sets of functions {<p(t — n)} and {ip(t — n)} respectively. What was given 
up, however, is the compact support that (3^ N '(t) has. But it can be shown that the 
scaling function and the wavelet have exponential decay. The argument begins with 
the fact that ip(t) is a linear combination of functions /?' '(t — n). Because /?' - '(t) 
has compact support, a finite number of functions from the set {f3^ N '(t — n)} n ^z 
contribute to (p(t) for a given t (for example, two in the linear spline case). That 
is, \ip(t)\ is of the same order as | Yli=o a k+l\ where k — \t\. Now, {a^} is the 
impulse response of a stable filter (noncausal in general) because it has no poles 
on the unit circle (this follows from (4.3.13)). Therefore, the sequence a^ decays 
exponentially and so does (f(t). The same argument holds for ip{t) as well. For a 
formal proof of this result, see [73]. While the compact support of j3^ N '{i) has been 
lost, the fast decay indicates that ip(t) and ip{t) are concentrated around the origin, 
as is clear from Figures 4.13(a) and (c). The above discussion on orthogonalization 
was limited to the very simple linear spline case. However, it is clear that it works 
for the general U-spline case since it is based on the orthogonalization (4.3.14). For 
example, the quadratic spline, given by 

B (2) (u , ) = e -^(!i^)) 3 , (4.3.26) 

leads to a function B^'(lo) (see 4.3.11) equal to 

B (5 V) = 66 + 26(e*" + e~ jbJ ) + e j2w + e~ j2w , (4.3.27) 

which can be used to orthogonalize B^ 2 '(u) (see Problem 4.7). 
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Note that instead of taking a square root of B^ +1 '(u>) in the orthogonaliza- 
tion of B( N '(ui) (see (4.3.14)), one can use spectral factorization which leads to 
wavelets based on IIR filters [133, 296] (see also Section 4.6.2 and Problem 4.8). 
Alternatively, it is possible to give up intrascale orthogonality (but keep interscale 
orthogonality). See [299] for such a construction where a possible scaling function 
is a S-spline. One advantage of keeping a scaling function that is a spline is that, 
as the order increases, its localization in time and frequency rapidly approaches the 
optimum since it tends to a Gaussian [297]. 

An interesting limiting result occurs in the case of orthogonal wavelets for B- 
spline space. As the order of splines goes to infinity, the scaling function tends to 
the ideal lowpass or sine function [7, 175]. In our S-spline construction with N = 
and N — > oo, we thus recover the Haar and sine cases discussed in Section 4.2.3 as 
extreme cases of a multiresolution analysis. 

4.4 Wavelets Derived from Iterated Filter Banks and Regularity 

In the previous section, we constructed orthonormal families of functions where each 
function was related to a single prototype wavelet through shifting and scaling. 
The construction was a direct continuous-time approach based on the axioms of 
multiresolution analysis. In this section, we will take a different, indirect approach 
that also leads to orthonormal families derived from a prototype wavelet. Instead 
of a direct continuous-time construction, we will start with discrete-time filters. 
They can be iterated and under certain conditions will lead to continuous-time 
wavelets. This important construction, pioneered by Daubechies [71], produces 
very practical wavelet decomposition schemes, since they are implementable with 
finite- length discrete-time filters. 

In this section, we will first review the Haar and sine wavelets as limits of 
discrete-time filters. Then we extend this construction to general orthogonal fil- 
ters, showing how to obtain a scaling function ip and a wavelet ip as limits of an 
appropriate graphical function. This will lead to a discussion of basic properties of 
ip and Vj namely orthogonality and two-scale equations. It will be indicated that 
the function system {2~ m ' 2 ip(2 m t — n)}, m,n € Z, forms an orthonormal basis for 

L 2 (n). 

A key property that the discrete-time filter has to satisfy is the regularity con- 
dition, which we explore first by way of examples. A discrete-time filter will be 
called regular if it converges (through the iteration scheme we will discuss) to a 
scaling function and wavelet with some degree of regularity (for example, piece- 
wise smooth, continuous, or derivable). We show conditions that have to be met 
by the filter and describe regularity testing methods. Then, Daubechies' family of 
maximally regular filters will be derived. 
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Figure 4.14 Filter bank iterated on the lowpass channel: connection between 
discrete- and continuous-time cases. 



4.4.1 Haar and Sine Cases Revisited 

As seen earlier, the Haar and sine cases are two particular examples which are duals 
of each other, or two extreme cases. Both are useful to explain the iterated filter 
bank construction. The Haar case is most obvious in time domain, while the sine 
case is immediate in frequency domain. 



Haar Case Consider the discrete-time Haar filters (see also Section 4.1.3). The 
lowpass is the average of two neighboring samples, while the highpass is their dif- 
ference. The corresponding orthogonal filter bank has filters go[n] = [l/v2, l/v2] 
and gi[n] = [l/v2, — l/v2] which are the basis functions of the discrete-time Haar 
expansion. Now consider what happens if we iterate the filter bank on the lowpass 
channel, as shown in Figure 4.14. In order to derive an equivalent filter bank, we 
recall the following result from multirate signal processing (Section 2.5.3): Filtering 
by go[n] followed by upsampling by two is equivalent to upsampling by two, followed 
by filtering by <?o[ n ]' wnere 9o[ n ] is the upsampled version of <?o[ n ]- 

Using this equivalence, we can transform the filter-bank tree into one equivalent 
to the one depicted in Figure 3.8 where we assumed three stages and Haar filters. It 
is easy to verify that this corresponds to an orthogonal filter bank (it is the cascade 
of orthogonal filter banks). This is a size-8 discrete Haar transform on successive 
blocks of 8 samples. Iterating the lowpass channel in Figure 4.14 i times, will lead 
to the equivalent last two filters 



$[*} 



2~ i / 2 n = 0,...,2* 
otherwise, 



248 CHAPTER 4 

( 2- 1 ' 2 n = 0, ...,2 i ~ 1 - 1, 
gf[n] = I -2- 1 / 2 n = 2 i ~ 1 ,...,2 i -1, 
{ otherwise, 

where g$ [n] is a lowpass filter and gf [n] a bandpass filter. Note also that <?q [n] = 
go[n] and g\ [n] = gi[n]. As we can see, as i becomes large the length grows 
exponentially and the coefficients go to zero. 

Let us now define a continuous-time function associated with <?q [n] and g} [n] 
in the following way: 

¥>W(t) = 2*/ 2 ^[n] ^<t<B±i, (4.4.1) 

These functions are piecewise constant and because the interval diminishes at 
the same speed as the length of <7q [n] and g\ [n] increases, their lengths remain 
bounded. 

For example, ip^ (t) and ip^ 5 ' (t) (the functions associated with the two bottom 
filters of Figure 3.8) are simply the indicator functions of the [0, 1] interval and 
the difference between the indicator functions of [0,^] and [^jl]) respectively. Of 
course, in this particular example, it is clear that ip ( l '(t) and tp( l '(t) are all identical, 
regardless of i. What is also worth noting is that (p"'(t) and ip^'(t) are orthogonal 
to each other and to their translates. Note that 

¥>W(t) = 2 1 /2( 5o[ o]^-i)(2t) + < 7o [l]^- 1 )(2t-l)) 

or, because p^'(t) = (/?'* _1 ^(i) in this particular example, 

tp(t) = 2 1 /\g [0] <p(2t)+go[l] <p(2t-l)). 

Thus, the scaling function ip(t) satisfies a two-scale equation. 

Sine Case Recall the sine case (see Example 4.2). Take an orthogonal filter bank 
where the lowpass and highpass filters are ideal half-band filters. The impulse 
response of the lowpass filter is 

r -, 1 sin(7r/2n) 

(see also (4.2.21)) which is orthogonal to its even translates and of norm 1. Its 2tt- 
periodic Fourier transform is equal to v2 for |cj| < 7r/2, and for tt/2 < \u\ < 7r. A 
perfect half-band highpass can be obtained by modulating go[n] with (— l) n , since 
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this shifts the passband by n. For completeness, a shift by one is required as well. 
Thus (see (4.2.22)) 

9l [n] = (-l)M 50 [- n + l]. 

Its 27r-periodic Fourier transform is 

<^> - { -T otfJ<'S: «- 3 » 

Now consider the iterated filter bank as in Figure 4.14 with ideal filters. Upsampling 
the filter impulse response by 2 (to pass it across the upsampler) leads to a filter 
g' [n] with discrete-time Fourier transform (see Section 2.5.3) 

G'o(e^) = Go(e^), 

which is 7r-periodic. It is easy to check that G' (e :]u ')Go(e :,u; ) is a quarter-band filter. 
Similarly, with G[{e ju; ) = G*i(e^), it is clear that G[{e^)G {e ju; ) is a bandpass 
filter with a passband from n/4 to ir/2. Figure 4.15 shows the amplitude frequency 
responses of the equivalent filters for a three-step division. 

Let us emulate the Haar construction with g^ '[n] and <?} [n] which are the 
lowpass and bandpass equivalent filters for the cascade of i-banks. In Figures 4.15(c) 
and (d), we have thus the frequency responses of g± [n] and <?q [n], respectively. 
Then, we define ^ l '{t) as in (4.4.1). The procedure for obtaining ^ l '{t) can be 
described by the following two steps: 

(a) Associate with g£ [n] a sequence of weighted Dirac pulses spaced 2~ l apart. 
This sequence has a 2*2-7r-periodic Fourier transform. 

(b) Convolve this pulse sequence with an indicator function for the interval [0, 2 _t ] 
of height 2 1 ' 2 (so it is of norm 1). 



Therefore the Fourier transform of tp^'(t) is 



(0/„jW2S „-^/2»+ 1 sin(w/2* +1 ; 



w/2 i + 1 ' 

Now, 

G£V W ) = Go( e ^)Go( e ^)---G ( e ^"^). (4.4.4) 

We introduce the shorthand 

M (u) = 4=G (e*"). (4.4.5) 
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(a) 



'k I n.iJU 



J2- 



\G 1 (e^)\ 



D/2 



(b) f |G ( e ^ )Gl(e i2n)| 

2- 
J2- 



(C) " 

-J2 



\G ti (e®)G (e' 2 °)G 1 (eW) 



D/8 Q/4 3Q/8 D/2 



^D 



D/4 D/2 3Q/4 □ 



(d) | 

272-1 



| G (e^) G (^) G (^ 4D ) 



D/8 Q/4 D/2 



-^D 



Figure 4.15 Amplitude frequency response of a three-step iterated filter bank 
with ideal half-band low and highpass filters, (a) |Gi(e JW )|, (b) \Go(e^ u ) 
Gi(e»' 2 ")|, (c) |G (e*") G (e>' 2 ") Gi(e»"*-)|, (d) |G (e^) G (e^) G (e^)|. 



Note that M (0) = 1. We can rewrite $W(u>) as 



$»( 



w 



II M o 



,fc=i 



2 A: 



■ e 



-W2 <4 



sin(o;/2 



i+1' 



;/2 < + 1 



(4.4.6) 



The important part in (4.4.6) is the product inside the square brackets (the rest 
is just a phase factor and the interpolation function). In particular, as i becomes 
large, the second part tends toward 1 for any finite u. Thus, let us consider the 
product involving Mq(lu) in (4.4.6). Because of the definitions of Mq(uj) in (4.4.5) 
and of Go(e joJ ) following (4.4.2), we get 



M n 



LO 



1 (21 - i)2 fc 7r < to < (21 + i)2 fc vr, I e Z 
otherwise. 
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The product 








Mo (|) Mo ® ■ 


■Mo(| 



is 2*27r periodic and equal to 1 for uj between — n and n, elsewhere. Therefore, as 
i goes to infinity, we are left with a perfect lowpass from — n to it, that is 

km ip^it) = K —i-, 

or a sine scaling function. What happens to the function ip^'(t)7 The iterated filter 
becomes 

Gf{j u ) = G {e^)---G {e j2 '~ 2u )G 1 {e j2l ~ lu; ), 

where Gi{e 3u) ) is given by (4.4.3). The Fourier transform of the wavelet is thus 



(i), > ~ I" 



*W(w) = Mi(- 



where, similarly to (4.4.5), 



n*>(2 

.fc=2 



e-^ +1 ^Z 2 "- 1 ) , (4.4.7) 



Mi(w) = 4=Gi(e J ' w ). (4.4.8) 

v2 

Suppose that we have i = 3. Note that M\{u/2) produces, following (4.4.3), a 
phase shift of e~^ w ' 2 or a time-domain delay of 1/2. It is clear that as % goes to 
infinity, \I/W(u;) converges to the indicator function of the interval [tv, 1-k\ (with a 
phase shift of e~ jw / 2 ). Thus 

*-oo^ w 2vr(t-i) *(*-§) 

This is of course the sine wavelet we had introduced in Section 4.2 (see (4.2.23)). 
What we have just seen seems a cumbersome way to rederive a known result. 
However, it is an instance of a general construction and some properties can be 
readily seen. For example, assuming that the infinite product converges, the scaling 
function satisfies (from (4.4.6)) 

oo 

*(<") = &s,* (i) M = II M °(I9 = Mo (£)*(£ 

fc=l 

or, in time domain 

oo 

¥>(<) = ^2 ^ 50 N v(2t-n), 
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and similarly, the wavelet satisfies 



i;(t) = V2 J2 9i[n] <p(2t-n). 



That is, the two-scale equation property is implicit in the construction of the iter- 
ated function. The key in this construction is the behavior of the infinite product of 
the Mq(lo/2 )'s. This leads to the fundamental regularity property of the discrete- 
time filters involved, which will be studied below. But first, we formalize the iterated 
filter bank construction. 

4.4.2 Iterated Filter Banks 

We will now show that the above derivation of the Haar and sine wavelets using 
iterated filter banks can be used in general to obtain wavelet bases, assuming that 
the filters satisfy some regularity constraints. In our discussion, we will concentrate 
mainly on the well-behaved cases, namely when the graphical function ^ l '(t) (asso- 
ciated with the iterated impulse response g Q [n] ) converges in L2 (71) to a piecewise 
smooth 5 function (p(t) (possibly with more regularity, such as continuity). In this 
case, the Fourier transform $w(w) converges in L2CR-) to 3>(w) (the Fourier trans- 
form of <p(t)). That is, one can study the behavior of the iteration either in time 
or in frequency domain. A counter-example to this "nice" behavior is discussed in 
Example 4.3 below. 

To demonstrate the construction, we start with a two-channel orthogonal filter 
bank as given in Section 3.2. Let go [n] and gi[n] denote lowpass and highpass 
filters, respectively. Similarly to the Haar and sine cases, the filter bank is iterated 
on the branch with the lowpass filter (see Figure 4.14) and the process is iterated to 
infinity. The constructions in the previous section indicate how to proceed. First, 
express the two equivalent filters after i steps of iteration as (use the fact that 
filtering with Gi(z) followed by upsampling by 2 is equivalent to upsampling by 2 
followed by filtering with Gi(z 2 )) 

Gf{z) = fi G o(^), (4.4.9) 

k=0 

Gf{z) = G 1 (z 2i - 1 )JlG {z 2k ), , = 1,2,... 

fc=0 



This is more restrictive than necessary, but makes the treatment easier. 
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These filters are preceded by upsampling by 2* (note that G (z) = G\ (z) = 1). 
Then, associate the discrete-time iterated filters g£ [n],<7i [n] with the continuous- 
time functions ip^ (t) , ip^' (t) as follows: 



^){t) = 2 i / 2 gf ) [n], n/T <t<^. 



(4.4.10) 
(4.4.11) 



Note that the elementary interval is divided by 1/2*. This rescaling is necessary 
because if the length of the filter go[n] is L then the length of the iterated filter 



(j)r 

g N 1S 



L« = (2 i -l)(L-l) + l 



which will become infinite as i — > oo. Thus, the normalization ensures that the 
associated continuous-time function </?(£) stays compactly supported (as i — > oo, 
<p"'(t) will remain within the interval [0, L — 1]). The factor 2' 1 ' 2 which multi- 
plies <7q [n] and g^ [n] is necessary to preserve the L2 norm between discrete and 

1 as well, since each 



1, then ||</? w (£)ll 



continuous-time cases. If ||<?q [n] 

piecewise constant block has norm |<7q [n]|. 

In Figure 4.16 we show the graphical function for the first four iterations of a 
length-4 filter. This indicates the piecewise constant approximation and the halving 
of the interval. 

In Fourier domain, using M (w) = G Q {e ju) ) / \^2 and Mi(w) = Gi(e J ' a, )/v / 2, we 
can write (4.4.10) and (4.4.11) as (from (4.4.6)) 



$W( 



to 



U M o K2k 

,k=l 



uJ 



U 



where 

G w (c 

as well as (from (4.4.7) 



-jw/2 



i+i sin(u;/2 i + 1 ) 



Mi 



u; 



H M o 



.fc=2 



w/2* 



+1 



g w (cj). 



A fundamental question is: To what, if anything, do the functions ^'(t) and ip^'(t) 
converge as i — > 00? We will proceed by assuming convergence to piecewise smooth 
functions in L 2 (JZ): 

<p(t) = lim (^ W (t), (4.4.12) 
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(a) 



(b) 





(c) 



(d) 



Figure 4.16 Graphical functions corresponding to the first four iterations of 
an orthonormal 4-tap filter with two zeros at u) = 7r. The filter is given in the 
first column of Table 4.3. (a) cp W {t). (b) tp {2) {t). (c) <p {3) {t). (d) ^ 4 \t). 



7p(t) = lim ^ W (t). 
In Fourier domain, the above equations become 

00 

$(w) = lim $ (i) (uj) = TTm 

'i — i-rv^ -■" ■*• 



*(w) 



lim * (i) (> 



Mi 



fe=i 



CJ 



2 A: 



Il M o 



fc=2 



2 k ) 



(4.4.13) 



(4.4.14) 



(4.4.15) 



since Q^ 1 '(lo) becomes 1 for any finite u as i — > oo. Next, we demonstrate that 
the functions (p(t) and tp(t), obtained as limits of discrete-time iterated filters, are 
actually a scaling function and a wavelet, and that they carry along an underlying 
multiresolution analysis. 



4.4. WAVELETS DERIVED FROM ITERATED FILTER BANKS AND REGULARITY 255 

Two-Scale Equation Property Let us show that the scaling function (p(t) satisfies 
a two-scale equation, as required by (4.2.8). Following (4.4.9), one can write the 
equivalent filter after i steps in terms of the equivalent filter after (i — 1) steps as 

gf[n] = ^aoWgt^ln-^k]. (4.4.16) 

k 
Using (4.4.10), express the previous equation in terms of iterated functions: 

g®[n] = 2"f <pM(t), (4.4.17) 

g { *~ l) [n-2 i - l k) = 2-^ i ~ 1 '>/ 2 ip (i ~ 1 \2t-k), (4.4.18) 



«(i) = ^/2j2ao[k] <p {i - l) (2t-k). (4.4.19) 



both for n/2 i < t < (n + l)/2\ Substituting (4.4.17) and (4.4.18) into (4.4.16) 
yields 

k 

By assumption, the iterated function ip"'(t) converges to the scaling function ip(t). 
Hence, take limits on both sides of (4.4.19) to obtain 

<p(t) = V2j29o[k} <p(2t-k), (4.4.20) 

k 

that is, the limit of the discrete-time iterated filter (4.4.12) satisfies a two-scale 
equation. Similarly, 

iP(t) = V2j2ai[k] <p(2t-k). 

k 

These relations also follow directly from the Fourier-domain expressions 3>(w) and 
&(w), since, for example, from (4.4.14) we get 

oo oo 

•m = n*.(£) = ^(|)n^«e 

k=l k=l 

Orthogonality and Completeness of the Wavelet Basis We want to show that 
the wavelets constitute a basis for Li(JV). To that end, we will have to prove the 
orthogonality as well as the completeness of the basis functions. First, however, let 
us recall a few facts that are going to be used in our discussion. We will assume that 
we have an orthonormal filter bank as seen in Section 3.2.3. We will also assume 
the following: 
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(a) (g [k],g 1 [k + 2n]) = 0, (g [k],g [k + 2n]) = <</i[fc],^i[fc + 2n]) = %], that 
is, filters go an< i 5i are orthogonal to each other and their even translates as 
given in Section 3.2.3. 

(b) Go(z)\ z =i = V2, Go(z)\ z =-i = 0, that is, the lowpass filter has a zero at 
the aliasing frequency tt (see also the next section). 

(c) The filters are FIR. 

(d) gi[n] = (— l) n go[— n + 1], as given in Section 3.2.3. 

(e) The scaling function and the wavelet are given by (4.4.12) and (4.4.13). 

In the Haar case, it was shown that the scaling function and the wavelet were 
orthogonal to each other. Using appropriate shifts and scales, it was shown that 
the wavelets formed an orthonormal set. Here, we demonstrate these relations in 
the general case, starting from discrete-time iterated filters. The proof is given only 
for the first fact, the others would follow similarly. 

PROPOSITION 4.4 Orthogonality Relations for the Scaling Function and Wavelet 

(a) The scaling function is orthogonal to its appropriate translates at a given 
scale 

(<p(2 m t-n),<p(2 m t-n')) = 2" m %-n']. 

(b) The wavelet is orthogonal to its appropriate translates at all scales 

(iP(2 m t-n),iP(2 m t-n')) = 2" m 5[n - n']. 

(c) The scaling function is orthogonal to the wavelet and its integer shifts 

(#,#-")) = o. 

(d) Wavelets are orthogonal across scales and with respect to shifts 

(4>(2 m t - n), V(2 m 't - n')> = 2" m - m '5[m - m'] S[n - n). 



Proof 



To prove the first fact, we use induction on yr- 1 ' and then take the limit (which exists by 
assumption). For clarity, this fact will be proven only for scale (scale m would follow 
similarly). The first step (v3 (£)> <P (t — 0) = 5[n] is obvious since, by definition, (^^'(t) 
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is just the indicator function of the interval [0, 1). For the inductive step, write 

<^ (i+1) (f),VP (l+1) (i-0) = (V2^2g [k] i P (l) (2t-k),V2Y / 9o[m] <p [l) {2t - 21 - m)) 

k m 

= 2 X)X>M9°M {^ l) {2t-k),if^{2t-2l-m)) 

k m 

- ^ 9a[m] g [2l + m] - {g [m] , g [21 + m] ) = 6[l], 

m 

where the orthogonality relations between discrete-time filters, given at the beginning of 
this subsection, were used. Taking the limits of both sides of the previous equation, the first 
fact is obtained. The proofs of the other facts follow similarly. 

We have thus verified that 

S = {2~^tp(2~ m t -n)\m,neZ, U e K}, 

is an orthonormal set. The only remaining task is to show that the members of the 
set S constitute an orthonormal basis for L,2(ll), as stated in the following theorem. 

Theorem 4.5 [71] 

The orthonormal set of functions S = {ip m ,n \ rn,n £ Z, U £ 1Z] where 
ipm,n(t) = 2~~tp(2~ m t — n) is a basis for Li2(lZ), that is, for every / £ L^iJV) 



E KVw*,/)!' 



m,n£Z 

Since the proof is rather technical and does not have an immediate intuitive in- 
terpretation, an outline is given in Appendix 4. A. For more details, the reader is 
referred to [71, 73]. Note that the statement of the theorem is nothing else but the 
Parseval's equality as given by (d) in Theorem 2.4. 

4.4.3 Regularity 

We have seen that the conditions under which (4.4.12-4.4.13) exist are critical. We 
will loosely say that they exist and lead to piecewise smooth functions if the filter 
go[n] is regular. In other words, a regular filter leads, through iteration, to a scaling 
function with some degree of smoothness or regularity. 

Given a filter Gq(z) and an iterated filter bank scheme, the limit function ip(t) 
depends on the behavior of the product 

k=\ 



n M «y. (4-4.21) 
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for large i, where Mq{uj) = Go(e JW )/Go(l) so that Mo(0) = 1. This normalization 
is necessary since otherwise either the product blows up at u = (if Mq(0) > 1) or 
goes to zero (if Mq(0) < 1) which would mean that (p(t) is not a lowpass function. 

Key questions are: Does the product converge (and in what sense)? If it con- 
verges, what are the properties of the limit function (continuity, differentiability, 
etc.)? It can be shown that if |Mo(u;)| < 1 and Mq(0) = 1, then we have pointwise 
convergence of the infinite product to a limit function <&(cj) (see Problem 4.12). In 
particular, if Mq(uj) corresponds to the normalized lowpass filter in an orthonor- 
mal filter bank, then this condition is automatically satisfied. However, pointwise 
convergence is not sufficient. To build orthonormal bases we need L2 convergence. 
This can be obtained by imposing some additional constraints on Mq(uj). Finally, 
beyond mere L2 convergence, we would like to have a limit <fr(u;) corresponding to 
a smooth function (f(t). This can be achieved with further constraints of Mq(uj). 
Note that we will concentrate on the regularity of the lowpass filter, which leads 
to the scaling function ip(t) in iterated filter bank schemes. The regularity of the 
wavelet ip{t) is equal to that of the scaling function when the filters are of finite 
length since tp(t) is a finite linear combination of ip(2t — n). 

First, it is instructive to reconsider a few examples. In the case of the perfect 
half-band lowpass filter, the limit function associated with the iterated filter con- 
verged to sin(7r£)/7r£ in time. Note that this limit function is infinitely differentiable. 
In the Haar case, the lowpass filter, after normalization, gives 

1 4. g-jw 

MoH = -^— , 

which converged to the box function, that is, it converged to a function with two 
discontinuous points. In other words, the product in (4.4.21) converges to 

fc=l fc=l \ / ' 

For an alternative proof of this formula, see Problem 4.11. Now consider a filter 
with impulse response [5, 1, 5L that is, the Haar lowpass filter convolved with itself. 
The corresponding Mq(u) is 

1 4- 2e~i w 4- p-i 2uJ ( 1 4- p-i^ \ 2 

M ( U ) = 1 + 2e / e = ( 1± ^J • ( 4 - 4 - 2 3) 

The product (4.4.21) can thus be split into two parts; each of which converges to 
the Fourier transform of the box function. Therefore, the limit function ip(t) is 
the convolution of two boxes, or, the hat function. This is a continuous function 
and is differentiable except at the points t = 0, 1 and 2. It is easy to see that 
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if we have the TVth power instead of the square in (4.4.23), the limit function 
will be the (N — l)-time convolution of the box with itself. This function is (N — 1)- 
times differentiable (except at integers where it is once less differentiable) . These are 
the well-known i?-spline functions [76, 255] (see also Section 4.3.2). An important 
fact to note is that each additional factor (1 + e JW )/2 leads to one more degree of 
regularity. That is, zeros at to = ir in the discrete-time filter play an important role. 
However, zeros at u = tt are not sufficient to insure regularity. We can see this in 
the following counter-example [71]: 

Example 4.3 Convergence Problems 

Consider the orthonormal filter g [n] = [1/v^, 0, 0, 1/VZ] or M (w) = (l + e" j3t ")/2. The 
infinite product in frequency becomes, following (4.4.22), 

which is the Fourier transform of 1/3 times the indicator function of the interval [0, 3]. This 
function is clearly not orthogonal to its integer translates, even though every finite iteration 
of the graphical function is. That is, (4.2.21) is not satisfied by the limit. Also, while every 
finite iteration is of norm 1, the limit is not. Therefore, we have failure of L2 convergence 
of the infinite product. 

Looking at the time-domain graphical function (see Figure 4.17), it is easy to check 
that ip^'(t) takes only the values or 1, and therefore, there is no pointwise convergence 
on the interval [0,3]. Note that ip^ l \t) is not of bounded variation as i — » 00. Thus, even 
though (p- l '(t) and & 1 ' (to) are valid Fourier transform pairs for any finite i, their limits are 
not; since ip(t) does not exist while $(0;) is given by (4.4.24). This simple example indicates 
that the convergence problem is nontrivial. 

A main point of the previous example is that failure of L2 convergence indicates 
a breakdown of the orthonormal basis construction that is based on iterated filter 
banks. Several sufficient conditions for L2 convergence have been given. Mallat 
shows in [180] that a sufficient condition is 

|M (w)|>0, M<|- 

It is easy to verify that the above example does not meet it since Mq(tt/3) = 0. 
Another sufficient condition by Daubechies also allows one to impose regularity. 
This will be discussed in Proposition 4.7. Necessary and sufficient conditions for 
Z/2 convergence are more involved, and were derived by Cohen [55] and Lawton 
[169, 170] (see [73] for a discussion of these conditions). 

The next example considers the orthogonal filter family that was derived in 
Section 3.2.3. It shows that very different behavior can be obtained within a family. 
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(a) 



(b) 



□ (1) W 



1/2 



A D (2) W 



1/2 1 



(c) 



□ (,, w 



1/2' 



3-1/2'-' 



Figure 4.17 Counter-example to convergence. The discrete-time filter has im- 
pulse response [1/^2, 0, 0, 1/^2]. (a) tp (1) {t). (b) tp (2) {t). (c) ip {i) {t). 



Example 4.4 Iteration of Length-^ Orthogonal Family 

Consider a 4-tap orthogonal filter bank. From the cascade structure discussed in Sec- 
tion 3.2.3 (Example 3.3) the 4-tap lowpass filter has the impulse response 



9o [n\ 



[cos ojq cos on, cos ai smao, — smao sin an, cos ao sm ail 



(4.4.25) 



In order to force this filter to have a zero at u — it, it is necessary that ao + cm — n/4. 
Choosing «o = 7r/3 and a.\ = —it/12 leads to a double zero at u> — n and corresponds to 
a Daubechies' filter of length 4. In Figure 4.18, we show iterates of the orthogonal filter in 
(4.4.25) from ao = 7r/3 (the Daubechies' filter) to ao = n/2 (the Haar filter), with «i being 
equal to 7r/4— ao- As can be seen, iterated filters around the Daubechies' filter look regular 
as well. The continuity of the Daubechies' scaling function will be shown below. 



The above example should give an intuition for the notion of regularity. The Haar 
filter, leading to a discontinuous function, is less regular than the Daubechies filter. 
In the literature, regularity is somewhat loosely defined (continuity in [194], conti- 
nuity and differentiability in [181]). As hinted in the spline example, zeros at the 



4.4. WAVELETS DERIVED FROM ITERATED FILTER BANKS AND REGULARITY 



261 




Figure 4.18 Iterated orthogonal lowpass niters of length 4 with one zero at 
lu = ir (or a\ = 7r/4 — ao). For a$ = 7r/3, there are two zeros at 7r and this leads 
to a regular iterated filter of length 4. This corresponds to the Daubechies' 
scaling function. The sixth iteration is shown. 



aliasing frequency to = n (or z = — 1) play a key role for the regularity of the filter. 
First, let us show that a zero at to = n is necessary for the limit function to exist. 
There are several proofs of this result (for example in [92]) and we follow Rioul's 
derivation [239]. 

Given a lowpass filter Gq{z) and its iteration Gq (z) (see (4.4.9)), consider the 
associated graphical function <p"'(t) (see (4.4.10)). 

PROPOSITION 4.6 Necessity of a Zero at Aliasing Frequency 

For the limit tp(t) = lim^oo tp^'{t) to exist, it is necessary that Go(— 1) = 0. 

Proof 

For the limit of ip (t) to exist it is necessary that, as i increases, the even and odd samples 
of <7q [n] tend to the same limit sequence. This limit sequence has an associated limit 
function ip(2t). Use the fact that (see 4.4.4) 

G$\z) = Go(z)Gt 1) (z 2 ) = (G^+z-'Goiz^Gt'Hz 2 ), 



where the subscripts e and o stand for even and odd- indexed samples of go [n] , respectively. 
We can write the even and odd- indexed samples of g^ [n] in ^-transform domain as 

G<°(z) = G e (z)G i i - 1) (z), 
G?(z) = G o (z)G ( i - 1) (z), 
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Figure 4.19 Eighth iteration of the filter which fails to converge because of 
the absence of an exact zero at u = it. The filter is a Smith and Barnwell filter 
of length 8 [271] (see Table 3.2). 



or, in time domain 



„(»-i)r 



gW[2n] = 5>o[2fc] S r>-fc], 

k 

ffi°[2n + l] = ^So[2fc + l]^ i_1) [n-fc]. 



(4.4.26) 

(4.4.27) 



When considering the associated continuous function (p 1 - 1 ' (t) and its limit as i goes to infinity, 
the left side of the above two equations tends to <p(2t). For the right side, note that k is 
bounded while n is not. Because the intervals for the interpolation diminish as 1/2* , the 
shift by k vanishes as i goes to infinity and g£~ [n — k] leads also to (p(2t). That is, (4.4.26) 
and (4.4.27) become equal and 



£>[2fc] 

L k 



ip(2t) 



Y,9o[2k + l] 



ip(2t), 



which, assuming that ip(2t) is not zero for some t, leads to 

£>[2fc] = £)po[2fc + l]. 



Since Go(e? u )\ u =ir = ^2 k 3o[2fc] — ^2 k go[2k + 1], we have verified that if tp(t) is to exist, 
the filter has necessarily a zero at lo — it. 

Note that a zero at lo = it is not sufficient, as demonstrated by the filter with 
impulse response [l/\/2, 0, 0, l/\/2] (see Example 4.3). Another interpretation of 
the above result can be made in Fourier domain, when considering the product in 
(4.4.21). This product is 2tt • 2*-periodic. Consider its value at u = tt2 1 : 



II M o(*2 



j-i 



(i-k), 



M (tt) Yl M (2tt2 



(i-fc-l)x 



Mo(tt), 



fc=l 



fc=l 
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since Mq(oj) is 27r-periodic and Mq(0) = 1. That is, unless Mq(tt) is exactly 
zero, there is a nonzero Fourier component at an arbitrary high frequency. This 
indicates that g$ [n] and <?q [2n + 1] will never be the same. This results in highest 
frequency "wiggles" in the iterated impulse response. As an example, we show, in 
Figure 4.19, the iteration of a filter which is popular in subband coding [271], but 
which does not have an exact zero at to = n. The resulting iterated function has 
small wiggles and will not converge. Note that most filters designed for subband 
coding have high (but maybe not infinite) attenuation at to = it, thus the problem 
is usually minor. 

A Sufficient Condition for Regularity In [71], Daubechies studies the regularity 
of iterated filters in detail and gives a very useful sufficient condition for an iterated 
filter and its associated graphical function to converge to a continuous function. 
Factor Mq(co) as 

Because of the above necessary condition, we know that iV has to be at least equal 
to 1. Define B as 

B = sup we [o )2ff ]|-R(w)|- 
Then the following result due to Daubechies holds [71]: 
Proposition 4.7 

If 

B < 2 N ~ 1 , (4.4.28) 

then the limit function ^ l '{t) as i — > oo converges pointwise to a continuous 
function (p(t) with the Fourier transform 

oo 

$H = II M o(|:)- ( 4 - 4 - 29 ) 

fc=i 

Proof 

It is sufficient to show that for a large enough w, the decay of $(w) is faster than C(1 + |oj|) -1 . 
This indicates that <p(t) will be continuous. Rewrite (4.4.29) as follows: 



oo oo / . , ^ /2 k \ oo 

fe=i fc=i V / fc=i 



In the above, the first product on the right side is a smoothing part and equals 

/ sin(o>/2 ) x A 
V w/2 
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which leads to a decay of the order of C"(l + |u>|) _Jv . But then, there is the effect of the 
remainder R(oj). Recall that |-R(0)| = 1. Now, |-R(w)| can be bounded above by l + c|w|, for 
some c, and thus |-R(w)| < e c '^'. Consider now Fifcli R ( w /2 fc ), for \ui\ < 1. In particular, 

■»pm<iIIK£)| * fi^ /2k) = ^ (1/2+1/4+ -° < - 

k=\ k=\ 

Thus, for \uj\ < f, we have an upper bound. For any uj, \u>\ > 1, there exists J > 1 such 
that 2 J_1 < \u\ < 2 J . Therefore, split the infinite product into two parts: 

fiKI9l = riKI9lnK3^)l- 

k=i k=i fe=i 

Since \u}\ < 2 , we can bound the second product by e c . The first product is smaller than, 
or equal to B J . Thus 



n *£**'« 



e < c 2 ■ ' < c (1 + \w\) 

Putting all this together, we finally get 



ri M o( F )<(i+i-i)- i - £ . 

k=i 

Let us check the Haar filter, or Mq{uj) = |(1 + e^) 2 x 1. Here, N = 1 and 
the supremum of the remainder is one. Therefore, the inequality (4.4.28) is not 
satisfied. Since the bound in (4.4.28) is sufficient but not necessary, we cannot infer 
discontinuity of the limit. However, we know that the Haar function is discontinuous 
at two points. On the other hand, the length-4 Daubechies' filter (see Example 4.4) 
yields 

Afo(w) = \{l + e~ jw f\{\ + V3+ (1 - V3)e~n 

and the maximum of |i?(u;)|, attained at u> = it, is B = V3. Since N = 2, the 
bound (4.4.28) is met and continuity of the limit function <p(t) is proven. 

A few remarks are in place. First, there are variations in using the above 
sufficient condition. For example, one can test the cascade of I filters with respect 
to upsampling by 2 . Calling B{ the following supremum, 



Bi = sup wg[027r ] 



n^ 2 



v 



A;=0 
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the bound (4.4.28) becomes 



Bl < 2^-1). 



Obviously, as I becomes large, we get a better approximation since the cascade 
resembles the iterated function. Another variation consists in leaving some of the 
zeros at u = ir in the remainder, so as to attenuate the supremum B. If there is a 
factorization that meets the bound, continuity is shown. 

Then, additional zeros at to = tt (beyond the one to ensure that the limit exists) 
will ensure continuity, differentiability and so on. More precisely if, instead of 
(4.4.28), we have 

B < 2 N - x -\ 1 = 1,2,... 

then <p(t) is /-times continuously differentiable. 

Other Methods for Investigating Regularity Daubechies' sufficient condition 
might fail and the filter might still be regular. Another criterion will give a lower 
bound on regularity. It is the Cohen's fixed-point method [55] which we describe 
briefly with an example. 




4Q/3 2D 



16Q/3 6Q 




2D 8D/3 



16D/3 6D 



* |M (D/8) 



2D 



40 16Q/3 6D 



<SD 



Figure 4.20 Critical frequencies used in Cohen's fixed point method (the shape 
of the Fourier transform is only for the sake of example). 



When evaluating the product (4.4.21), certain critical frequencies will align. 
These are fixed points of the mapping u> t-^> 2uj modulo 2tt. For example, u = zl2tt/3 
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is a critical frequency. This can be seen in Figure 4.20 where we show Mq(lj/2), 
Mo(w/4) and Mq(uj/8). It is clear from this figure that the absolute value of the 
product of Mo(w/2), Mo(w/4) and Mq(u>/8) evaluated at u — 167r/3 is equal to 
|M (27r/3)| 3 . In general 



n 

k=l 



'""£ 



=2*71-/3 






From this, it is clear that if \Mq (27r/3)| is larger than 1/2, the decay of the Fourier 
transform will not be of the order of 1/to and continuity would be disproved. Be- 
cause it involves only certain values of the Fourier transform, the fixed-point method 
can be used to test large filters quite easily. For a thorough discussion of the fixed- 
point method, we refer to [55, 57]. 

Another possible method for studying regularity uses L x L matrices corre- 
sponding to a length-L filter downsampled by 2 (that is, the rows contain the 
filter coefficients but are shifted by 2). By considering a subset of eigenvalues of 
these matrices, it is possible to estimate the regularity of the scaling function using 
Littlewood-Paley theory (which divides the Fourier domain into dyadic blocks and 
uses norms on these dyadic blocks to characterize, for example, continuity). These 
methods are quite sophisticated and we refer to [57, 73] for details. 

Finally, Rioul [239, 242] derived direct regularity estimates on the iterated filters 
which not only give sharp estimates but are quite intuitive. The idea is to consider 
iterated filters g£ '[n] and the maximum difference between successive coefficients. 
For continuity, it is clear that this difference has to go to zero. The normalization is 
now different because we consider the discrete-time sequences directly. Normalizing 
Gq{z) such that Go(l) = 2 and requiring again the necessary condition Gq{ — 1) = 0, 
we have 



lim max Uq [n + 1] — g£ 



n 



0, 



where g$ [n] is the usual iterated sequence. For the limit function ip(t) to be 
continuous, Rioul shows that the convergence has to be uniform in n and that the 
following bound has to be satisfied for a positive a: 



max 



9?[n+l]-g^[n]\ < CT 



Taking higher-order differences leads to testing differentiability as well [239, 242]. 
The elegance of this method is that it deals directly with the iterated sequences, 
and associates discrete successive differences with continuous-time derivatives in an 
intuitive manner. Because it is computationally oriented, it can be run easily on 
large filters. 
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4.4.4 Daubechies' Family of Regular Filters and Wavelets 

To conclude the discussion of iterated niters and regularity, we give the explicit 
construction of Daubechies' family of orthonormal wavelets. For more details, the 
reader is referred to [71 , 73] . Note that this is another derivation of the maximally 
flat orthogonal filters studied in Chapter 3. Recall that perfect reconstruction 
together with orthogonality can be expressed as (see Section 3.2.3) 

|M (e^)| 2 + |M (e^ + ^)| 2 = 1, (4.4.30) 

where Mq{^ w ) = Go(e :,tJ )/v2 is normalized so that Mo(l) = 1 and we assume 
Mq(tt) = 0. For regularity, the following is imposed on Mo(e- ?u '): 



Mo(e^) 



1 • N 

-(1 + e^ 
2 V 



R(e 



JV\ 



where N > 1. Note that R(l) = 1 and that \M (e^ UJ )\ 2 can be written as 

r i ji N 

|M (e^)| 2 = [cos 2 -J |i?(e^)| 2 . (4.4.31) 

Since |i?(e^)| 2 = R(e juJ ) ■ R*(e juJ ) = R(e^)R{e~^), it can be expressed as a 
polynomial in cos a; or of sin 2 u;/2 = (1 — cosa;)/2. Using the shorthands y = 
cos 2 (w/2) and P(l - y) = \R(e jiV )\ 2 , we can write (4.4.30) using (4.4.31) as 

yNp(l-y) + (l-y)*P(y) = 1, (4.4.32) 

where 

P{y) >0for y € [0,1]. (4.4.33) 

Suppose that we have a polynomial P{y) satisfying (4.4.32) and (4.4.33) and more- 
over 

supji2(e*")| = sup ve[0)1] |P(y)|5 < 2 N ~ l . 

Then, there exists an orthonormal basis associated with Go(e- JW ), since the iterated 
filter will converge to a continuous scaling function (following Proposition 4.7) from 
which a wavelet basis can be obtained (Theorem 4.5). 

Thus, the problem becomes to find P (y) satisfying (4.4.32) and (4.4.33) fol- 
lowed by extracting R{e 3W ) as the "root" of P. Daubechies shows [71, 73] that any 
polynomial P solving (4.4.32) is of the form 

p (y) = T / ( N ~ 1 + 3 )v 3 + y N Q(y), (4.4.34) 

j=o \ 3 / 



268 



CHAPTER 4 



where Q is an antisymmetric polynomial. For the specific family in question, 
Daubechies constructs filters of minimum order, that is, with Q = (see also 
Problem 4.13). Note that such maximally flat filters (they have a maximum num- 
ber of zeros at uj = it) have been derived long before filter banks and wavelets by 
Herrmann [134], in the context of FIR filter design. 

With such a P, the remaining task is to determine R. Using spectral factor- 
ization, one can construct such a R from a given P as explained in Section 2.5.2. 
Systematically choosing zeros inside the unit circle for R{e 3u) ) one obtains the min- 
imum phase solution for G f o(e- ,u; ). Choosing zeros inside and outside the unit circle 
leads to mixed phase filters. There is no linear phase solution (except the Haar case 
when N = 1 and R(e> w ) = 1). 

Example 4.5 

Let us illustrate the construction for the case N — 2. Using (4.4.34) with N — 2, Q — 0, 

P{y) = l + 2y. 



From (4.4.32), 
\R(e j ")\ 2 



P(l-y) = 3 - 2 cos 2 (w/2) = 2-coscj 



2 — -e — -e 
2 2 



where we used y — cos 2 co/2 = 1/2(1 + cosoj). Now take the spectral factorization of 
|_R(e J ^)| 2 . The roots are n = 2 + \/3 and r 2 = 2 - a/3 = 1/n. Thus 



\R( e nr = 



4-2V3 



e ^_(2-V3)][e- J "-(2-V3)]. 



A possible -R(e-' tJ ) is therefore 

fl(e^) 



_J_ [e ^_ (2 _V3)] = I[(l + V3)e^ + 1-V3] 



and the resulting Mo(e 3 ") is 



Mo(e^) = 



i(l + ^)] 2 i[(l + V3)e^ + l-v^] 

i[(l + V3)e j3 " + (3 + V3)e j2tJ + (3 - y/S)e"" + 1 - VS\. 
8 



This filter is the 4-tap Daubechies' filter (within a phase shift to make it causal and a scale 
factor of l/-\/2). That is, by computing the iterated filters and the associated continuous- 
time functions (see (4.4.12)- (4.4.13)), one obtains the D2 wavelet and scaling function as 
shown in Figure 4.4. The regularity (continuity) of this filter was discussed after Proposi- 
tion 4.7. 
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(a) 



(b) 





(c) 



(d) 



Figure 4.21 Daubechies' iterated graphical functions for N = 3, . . . , 6 (eighth 
iteration is plotted and they converge to their corresponding scaling functions). 
Their regions of support are from to 27V — 1 and thus only for TV = 3, 4, they 
are plotted in their entirety. For N = 5,6, after t = 7.0, their amplitude is 
negligible. Recall that the case N = 2 is given in Figure 4.4. (a) N = 3. (b) 
N = 4. (c) TV = 5. (d) TV = 6. 



Figure 4.21 gives the iterated graphical functions for N = 3, . . . , 6 (the eighth 
iteration is plotted and they converge to their corresponding scaling functions). 
Recall that the case N = 2 is given in Figure 4.4. Table 4.2 gives the R(z) functions 
for N = 2, . . . , 6, which can be factored into maximally regular filters. The lowpass 
filters obtained by a minimum phase factorization are given in Table 4.3. Table 4.4 
gives the regularity of the first few Daubechies' filters. 

This concludes our discussion of iterated filter bank constructions leading to 
wavelet bases. Other variations are possible by looking at other filter banks such 
as biorthogonal filter banks or IIR filter banks. Assuming regularity, they lead to 
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Table 4.2 Minimum degree remainder polynomials R(z) such that 
P(z) = 2~ 2N+1 (1 + z) N (1 + z~ l ) N R{z) is valid. 



N 



Coefficients of R(z) 



2- 1 [-l,4,-l] 

2- 3 [3,-18,38,-18,3] 

2- 4 [-5,40, -131, 208, -131, 40, -5] 

2~ 7 [35, -350, 1520, -3650, 5018, -3650, 1520, -350, 35] 

" 8 [-63, 756, -4067, 12768, -25374, 32216, -25374, 12768, -4067, 756, -63] 



Table 4.3 First few maximally flat Daubechies' filters. N is the number of zeros at 
uo = n and equals L/2 where L is the length of the filter. The lowpass filter go[n] 
is given and the highpass filter can be obtained as gi[n] = (— l) n go[— n + 2N — 1]. 
These are obtained from a minimum phase factorization of P(z) corresponding 
to Table 4.2. 
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biorthogonal wavelet bases with compact support and wavelets with exponential 
decay (see Section 4.6 for more details). 



4.5 Wavelet Series and Its Properties 

Until now, we have seen ways of building orthonormal bases with structure. It was 
shown how such bases arise naturally from the multiresolution framework. We also 
discussed ways of constructing these bases; both directly, in the Fourier domain, 
and starting from discrete-time bases — filter banks. 

The aim in this section is to define the wavelet series expansion together with its 
properties, enumerate some general properties of the basis functions, and demon- 
strate how one computes wavelet series expansion of a function. 
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Table 4.4 Holder regularity estimates 
for the first few Daubechies' filters 
(from [73]). The estimates given be- 
low are lower bounds. For example, 
for N = 3, finer estimates show that 
the function is actually differentiable 
[73]. 
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a(N) 
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0.500 
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0.915 
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1.275 
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1.596 
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1.888 



4.5.1 Definition and Properties 

Definition 4.8 

Assuming a multiresolution analysis defined by Axioms (4.2.1-4.2.6) and the 
mother wavelet ip{t) given in (4.2.14), any function / £ L<i(jZ) can be ex- 
pressed as 

f(t) = Yl F[™,n]ip m>n (t), (4.5.1) 

rn,ndZ 



where 



F[m,n] = (VVn,n(i),/(i)> 



1pm,n(t) f(t)dt. 



(4.5.2) 



We have assumed a real wavelet (otherwise, a conjugate is necessary). Equation 
(4.5.2) is the analysis and (4.5.1) the synthesis formula. We will list several impor- 
tant properties of the wavelet series expansion. 

Linearity Suppose that the operator T is defined as 

T[f(t)} = F[m,n] = (^ m , n (t), /(*)>. 

Then for any a, b £ 7Z 

T[af(t) + bg(t)} = aT[f(t)} + bT[g(t)}, 

that is, the wavelet series operator is linear. Its proof follows from the linearity of 
the inner product. 
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Shift Recall that the Fourier transform has the following shift property: If a signal 
and its Fourier transform pair are denoted by f(t) and F(u>) respectively, then the 
signal f(t — t) will have e~ ]ul F{u) as its Fourier transform (see Section 2.4.2). 

Consider now what happens in the wavelet series case. Suppose that the function 
and its transform coefficient are denoted by f(t) and F[m,n] respectively. If we 
shift the signal by r, that is, f(t — r), 



F'[m,n] = I Vm,n(*) f(t-T)dt 

2- m ' 2 ^(2- m t - n + 2" m r) f(t)dt. 



-oo 
oo 



For the above to be a coefficient from the original transform F[m, n], one must have 
that 

T m r G Z, 

or r = 2 m k, k £ Z. Therefore, the wavelet series expansion possesses the following 
shift property: If a signal and its transform coefficient are denoted by f(t) and 
F[m,n], then the signal /(£ — r), r = 2 rn k, k £ Z, will have F[m',n — 2~ m r], w! < 
m as its transform coefficient, that is, 

f(t-2 m k) < — ► F[m',n-2 m - m 'jfe], jb £ 2, $ <t- 

Thus, if a signal has a scale-limited expansion 

M 2 
ne2 m=-oo 

then this signal will possess the weak shift property with respect to the shifts by 
2 M2 k, that is 

f(t-2 M2 k) < — > F[m,n-2 M2 - m k], -oo < m < M 2 . 

Scaling Recall the scaling property of the Fourier transform: If a signal and its 
Fourier transform pair are denoted by f(t) and F(u), then the scaled version of the 
signal f(at) will have (l/|a|) • F{u/a) as its transform (see Section 2.4.2). 
The wavelet series expansion F'[m,n] of f'(t) = f(at), a > 0, is 

/•oo 

F'[m,n] = / ip m ,n(t) f(at) dt 



■oo 



1 r°° / o~ m + 

- 2- m ^r-^-n)f(t)dt. 
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scale m 



= -M 

shift n 



m = 
m= 1 
m = 2 



Figure 4.22 Dyadic sampling of the time-frequency plane in the wavelet series 
expansion. The dots indicate the center of the wavelets ipm,n{t)- 



Thus, when 2 m /a = 2 p , p £ Z, or a = 2 k , k £ Z, then F'[m,n] can be 
obtained from F[m,n], the wavelet transform of f(t): 

f(2~ k t)< >2 k/2 F[m-k,n], k £ Z. 

Scaling by factors which are not powers of two require reinterpolation. That is, 
either one reinterpolates the signal and then takes the wavelet expansion, or some 
interpolation of the wavelet series coefficients is made. The former method is more 
immediate. 

Parseval'S Identity The Parseval's identity, as seen for the Fourier- type expan- 
sions (see Section 2.4), holds for the wavelet series as well. That is, the orthonormal 
family {^ m , n } satisfies (see Theorem 4.5) 



E 

rn,ndZ 



l ,n,f)\ 2 = \\ff, f€L 2 (TZ). 



Dyadic Sampling and Time-Frequency Tiling When considering a series ex- 
pansion, it is important to locate the basis functions in the time-frequency plane. 
The sampling in time, at scale m, is done with a period of 2 m , since Vm,n(£) — 
V , m,o(^ — 2 m n). In scale, powers of two are considered. Since frequency is the 
inverse of scale, we find that if the wavelet is centered around luq, then \l/ m;n (u>) 
is centered around LJo/2 m . This leads to a dyadic sampling of the time- frequency 
plane, as shown in Figure 4.22. Note that the scale axis (or inverse frequency) is 
logarithmic. On a linear scale, we have the equivalent time-frequency tiling as was 
shown in Figure 2.12(d). 
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scale m 
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m = 2< 



t +« 2 shift n 



m = 
m = l 



m = -l6 ooooooooo 



shift n 



m = 2t 



(a) 



(b) 



Figure 4.23 (a) Region of coefficients F[m, n] which will be influenced by the 
value of the function at to- (b) Region of influence of the Fourier component 
F(u) ). 



Localization One of the reasons why wavelets are so popular is due to their ability 
to have good time and frequency localization. We will discuss this next. 

Time Localization Suppose that one is interested in the signal around t = to- Then 
a valid question is: Which values F[m,n] will carry some information about the 
signal f{t) at to, that is, which region of the (to, n) grid will give information about 

/(to)? 

Suppose a wavelet ip{t) is compactly supported on the interval [— ni,n 2 ]. Thus, 
Tpm,o(t) is supported on [— ni2 m ,n 2 2 m ] and ipm,n(t) is supported on [(— m + n)2 m , 
(ri2 + n)2 m ]. Therefore, at scale m, wavelet coefficients with index n satisfying 

(-m + n)2 m < t < (n 2 + n)2 m , 

will be influenced. This can be rewritten as 

2" m t -n 2 < n < 2~ m t + n 1 . 

Figure 4.23(a) shows this region on the (m, n) grid. 

The converse question is: Given a point F[mo,no] in the wavelet series expan- 
sion, which region of the signal contributed to it? From the support of ifj m ,n(t), it 
follows that /(t) for t satisfying 

(-ni+n )2 m ° < t < (n 2 + n )2 m ° 



influences F [mo, no] 
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Frequency Localization Suppose we are interested in localization, but now in the 
frequency domain. Since the Fourier transform of ip m ,n(t) — 2~ m ' 2, tp(2~ rn t — n) is 
2 m / 2 . ty(2 m uj) ■ e~i 2 nuJ , we can write F[m,n] using Parseval's formula as 

POO 

F[m,n] = / Vm,n(*) /(*) dt 



1 f°° 

= _ 2 ™/2 / F ( u ) <a*(2 m oj) e j2mnuJ duo. 
2tt J_ 00 

Now, suppose that a wavelet %l>{t) vanishes in the Fourier domain outside the region 
[w min ,u; max ]. 6 At scale m, the support of ^ m>n (w) will be [w min /2 m ,u; max /2 m ]. 
Therefore, a frequency component at ujq influences the wavelet series at scale m if 

^rnin , , ^max 

< Wn < 

Om — u — c ym 

is satisfied or if the following range of scales is influenced: 

lo g2 ) <m< log 2 

This is shown in Figure 4.23(b). Conversely, given a scale mo, all frequencies of the 
signal between LU ni i n /2 m ° and w max /2 m ° will influence the expansion at that scale. 

Existence Of Scale-Limited Signals Because of the importance of bandlimited 
signals in signal processing, a natural question is: Are there any scale-limited sig- 
nals? An easy way to construct such a signal would be to add, for example, Haar 
wavelets from a range of scales tuq < m < m\. Thus, the wavelet series expansion 
will posses a limited number of scales; or transform coefficients F[m,n] will exist 
only for tuq < m < mi. 

However, note what happens with the signal fit — e), for e not a multiple of 
2 mi . The scale-limitedness property is lost, and the expansion can have an infinite 
number of coefficients. For more details, see [116] and Problem 4.1. Note that the 
sine wavelet expansion does not have this problem, since it is intrinsically band/scale 
limited. 

Characterization Of Singularities The Fourier transform and Fourier series can 
be used to characterize the regularity of a signal by looking at the decay of the 
transform or series coefficients (see Appendix 2.C.2). One can use the wavelet 
transform and the wavelet series behavior in a similar way. There is one notable 



6 Therefore, the wavelet cannot be compactly supported. However, the discussion holds approx- 
imately for wavelets which have most of their energy in the band [uJmin,0Jmax]. 
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Figure 4.24 Two-scale equation for the D^ scaling function given in Figure 4.4(a). 

advantage over the Fourier case, however, in that one can characterize local regular- 
ity. Remember that the Fourier transform gives a global characterization only. The 
wavelet transform and the wavelet series, because of the fact that high frequency 
basis functions become arbitrarily sharp in time, allow one to look at the regular- 
ity at a particular location independent of the regularity elsewhere. This property 
will be discussed in more detail for the continuous-time wavelet transform in Chap- 
ter 5. The basic properties of regularity characterization carry over to the wavelet 
series case since it is a sampled version of the continuous wavelet transform, and 
since the sampling grid becomes arbitrarily dense at high frequencies (we consider 
"well-behaved" functions only, that is, of bounded variation). 

In a dual manner, we can make statements about the decay of the wavelet series 
coefficients depending on the regularity of the analyzed signal. This gives a way to 
quantify the approximation property of the wavelet series expansion for a signal of 
a given regularity. Again, the approximation property is local (since regularity is 
local) . 

Note that in all these discussions, one assumes that the wavelet is more regular 
than the signal (otherwise, the wavelet's regularity interferes). Also, because of the 
sampling involved in the wavelet series, one might have to go to very fine scales in 
order to get good estimates. Therefore, it is easier to use the continuous wavelet 
transform or a highly oversampled discrete-time wavelet transform (see Chapter 5 
and [73]). 



4.5.2 Properties of Basis Functions 

Let us summarize some of the important properties of the wavelet series basis 
functions. While some of them (such as the two-scale equation property) have been 
seen earlier, we will summarize them here for completeness. 
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Two-Scale Equation Property The scaling function can be built from itself (see 
Figure 4.24). Recall the definition of a multiresolution analysis. The scaling func- 
tion ip(t) belongs to Vq. However, since Vq C V-i, ip{t) belongs to V-\ as well. 
We know that ip(t — n) is an orthonormal basis for Vq and thus, \2ip{2t — n) is an 
orthonormal basis for V-\. This means that any function from Vo, including <p(t), 
can be expressed as a linear combination of the basis functions from V-i, that is, 
ip(2t — n). This leads to the following two-scale equation 

ip(t) = V2^2g [n}ip(2t-n). (4.5.3) 

n 

On the other hand, using the same argument for the wavelet ip(t) G Wq C V-i, one 
can see that 

tf;(t) = y/2^2gi[n](p(2t-n). (4.5.4) 

n 

These two relations can be expressed in the Fourier domain as 

*M = 4 E^oNe--^ * (|) = M (|) * Q , (4.5.5) 

* n 

*( W ) = ^E^Ne"-^ 0> (|) = M, (|) $ (|) . (4.5.6) 

v n 
As an illustration, consider the two-scale equation in the case of the Daubechies' 
scaling function. Figure 4.24 shows how the Di scaling function is built using four 
scaled and shifted versions of itself. 

The functions Mq(lj) and M\(uj) in (4.5.5) and (4.5.6) are 27r-periodic functions 
and correspond to scaled versions of filters <?o[ n ]) 9i\p\ (see (4.4.5) and (4.4.8)) which 
can be used to build filter banks (see Section 4.5.2 below). 

The two-scale equation can also be used as a starting point in constructing a 
multiresolution analysis. In other words, instead of starting from an axiomatic 
definition of a multiresolution analysis, choose (p(t) such that (4.5.3) holds, with 
E„ ISoMI 2 < oo and < A < £ n |$(w + 2-7rn)\ 2 < B < oo. Then define V m to be 
the closed subspace spanned by 2~ m ' 2 ip(2~ m t — n). All the other axioms follow (an 
orthogonalization step is involved if ip(t) is not orthogonal to its integer translates). 
For more details, refer to [73]. 

Moment Properties Of Wavelets Recall that the lowpass filter go[n], in an iter- 
ated filter bank scheme, has at least one zero at u = tt and thus, g\ [n] has at least 
one zero at u = 0. Since ^(0) = 1 (from the normalization of Mq(uj)) it follows 
that ^(w) has at least one zero at u> = 0. Therefore, 

$(t) dt = tf(0) = 0, 
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which is to be expected since tp(t) is a bandpass function. In general, if Go(e- ,u ') 
has an iVth-order zero at u = n, the wavelet ^(uj) has an ./Vth-order zero at lj = 0. 
Using the moment theorem of the Fourier transform (see Section 2.4.2), it follows 
that 

t n tp(t) dt = 0, n = 0,..., TV- 1, 

that is, the first N moments of the wavelet are zero. Besides wavelets constructed 
from iterated filter banks, we have seen Meyer's and Battle-Lemarie wavelets. 
Meyer's wavelet, which is not based on the iteration of a rational function, has 
by construction an infinite "zero" at the origin, that is, an infinite number of zero 
moments. The Battle-Lemarie wavelet, on the other hand, is based on the iVth- 
order i?-spline function. The orthogonal filter Go{e 3U) ) has an (A + l)th-order zero 
at n (see (4.3.18)) and the wavelet thus has N + 1 zero moments. 

The importance of zero moments comes from the following fact. Assume a 
length L wavelet with N zero moments. Assume that the function f(t) to be 
represented by the wavelet series expansion is polynomial of order iV - 1 in an 
interval [£o>ti]. Then, for sufficiently small scales (such that 2 m L < (t\ — to)/ 2) the 
wavelet expansion coefficients will automatically vanish in the region corresponding 
to [toj^i] since the inner product with each term of the polynomial will be zero. 
Another view is to consider the Taylor expansion of a function around a point to, 

/(to + e) = /(to) + — jj-e + — 2j— e + • • • • 

The wavelet expansion around to now depends only on the terms of degree iV and 
higher of the Taylor expansion since the terms through A^ — 1 are zeroed out 
because of the ./V zero moments of the wavelet. If the function is smooth, the 
high-order terms of the Taylor expansion are very small. Because the wavelet series 
coefficients now depend only on Taylor coefficients of order N and larger, they will 
be very small as well. 

These approximation features of wavelets with zero moments are important in 
approximation of smooth functions and operators and also in signal compression 
(see Chapter 7). 

Smoothness and Decay Properties of Wavelets In discussing the iterated fil- 
ters leading to the Daubechies' wavelets, we pointed out that besides convergence, 
continuity or even differentiability of the wavelet was often desirable. While this 
regularity of the wavelet is linked to the number of zeros at u = ir of the lowpass 
filter Go{e 3u) ), the link is not as direct as in the case of the zero-moment property 
seen above. In particular, there is no direct relation between these two properties. 
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Table 4.5 Zero moments, regularity, and decay of various wavelets. a(N) 
is a linearly increasing function of N which approaches 0.2075- N for large 
N. The Battle-Lemarie wavelet of order N is based on a i?-spline of order 
N — 1. The Daubechies' wavelet of order N corresponds to a length-2iV 
maximally flat orthogonal filter. 
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The regularity of all the wavelets discussed so far is indicated in Table 4.5. Reg- 
ularity r means that the rth derivative exists almost everywhere. The localization 
or decay in time and frequency of all these wavelets is also indicated in the table. 

Filter Banks Obtained from Wavelets Consider again (4.5.3) and (4.5.4). An 

interesting fact is that using the coefficients go [n] and g± [n] for the synthesis lowpass 
and highpass filters respectively, one obtains a perfect reconstruction orthonormal 
filter bank (as defined in Section 3.2.3). To check the orthonormality conditions 
for these filters use the orthonormality conditions of the scaling function and the 
wavelet. Thus, start from 

(<p(t + l),<p(t + k)) = 6[k-l], 

( Yl 3o[n] <p(2t + 2l-n),J2 9oH <p{2t + 2k - 

\ n m 

= (y29o[n' + 2l] i P (2t-n'),Y / g [m' + 2k] <p(2t 

\ n' m! I 

= \j2go[n+2l]g [n' + 2k] = 5[l - k], 

n' 

that is, the lowpass is orthogonal to its even translates. In a similar fashion, one can 
show that the lowpass filter is orthogonal to the highpass and its even translates. 
The highpass filter is orthogonal to its even translates as well. That is, {gi[n — 2k]}, 
i = 0,1, is an orthonormal set, and it can be used to build an orthogonal filter bank 
(see Section 3.2.3). 



(<p(t + l),<p(t + k)) 



in i 



m 
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4.5.3 Computation of the Wavelet Series and Mallat's Algorithm 

An attractive feature of the wavelet series expansion is that the underlying mul- 
tiresolution structure leads to an efficient discrete-time algorithm based on a filter 
bank implementation. This connection was pointed out by Mallat [181]. The com- 
putational procedure is therefore referred to as Mallat's algorithm. 

Assume we start with a function f(t) £ Vq and we are given the sequence 
/(°)[ n ] = (<p(t-n),f(t)),n€Z. That is 

oo 

/(*) = E f {0) M<P(t-n). (4.5.7) 

n=— oo 

We also assume that the axioms of multiresolution analysis hold. In searching for 
projections of f(t) onto V\ and W±, we use the fact that (p(t) and ip(t) satisfy 
two-scale equations. Consider first the projection onto V±, that is 

/W[n] = (± v (l- n y f(t)Y (4.5.8) 

Because (p(t) = V2J2k9o[k] l f(2t — k), 

7! ^\l~ n ) = ^ 9o[k] V{t-ln-k). (4.5.9) 

Thus, (4.5.8) becomes 

/ (1) N = T,9o[k] {<p(t-2n-k),f(t)), 

k 

and using (4.5.7), 

/ (1) N = EE^oM f {0) [l\ (<p(t-2n-k)Mt-l))- (4-5.10) 

k I 

Because of the orthogonality of (p(t) with respect to its integer translates, the inner 
product in the above equation is equal to 5 [I — 2n — k]. Therefore, only the term 
with I = 2n — k is kept from the second summation. With a change of variable, we 
can write (4.5.10) as 

/ (1) N = J29o[k-2n}f^[k}. 

k 

With the definition go[n] = go[— n], we obtain 

/ (1) N = E^>[2n-fc].f (0) [fc], (4.5.11) 
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that is, the coefficients of the projection onto V\ are obtained by filtering /(°' with 
go and downsampling by 2. To calculate the projection onto Wi, we use the fact 
that tp(t) = \[2 ^ k g\[k] ■ ip(2t — k). Calling d^fn] the coefficients of the projection 
onto Wi, or 



d (1) [n] = (-L^Q-n), f(t)), 



and using the two-scale equation for ip{t) as well as the expansion for f(t) given in 
(4.5.7), we find, similarly to (4.5.9-4.5.11) 



d(1) W = EE5iW/ (0) W (<p(t-2n-k)Mt-l)) 

k I 

= J2J29i[k}f i0) [l}S[l-2n-k] 

k I 
= J29i[l-Mf i0) [l] = ^2~9i(2n-l)f^[l], 



where gi[n] = gi[— n]. That is, the coefficients of the projection onto W\ are 
obtained by filtering Z' - 1 with g\ and downsampling by 2, exactly as we obtained the 
projection onto V\ using ^o- Of course, projections onto V2 and W2 can be obtained 
similarly from filtering jn 1 ) and downsampling by 2. Therefore, the projections 
onto W m , m = 1, 2, 3, . . . are obtained from m — 1 filtering with go[n] followed by 
downsampling by 2, as well as a final filtering by gi[n] and downsampling. This 
purely discrete-time algorithm to implement the wavelet series expansion is depicted 
in Figure 4.25. 

A key question is how to obtain an orthogonal projection f(t) onto Vo from an 
arbitrary signal f(t) . Because {ip(t — n}} is an orthonormal basis for Vo, f(t) equals 

f(t) = 52(<p(t-n),f(t))<p(t-n), 

n 

and f(t) — f(t) is orthogonal to (p(t — n), n £ Z. Thus, given an initial signal 
f(t), we have to compute the set of inner products f^°'(n) = (ip(t — n),f(t)}. 
This, unlike the further decomposition which involves only discrete-time processing, 
requires continuous-time processing. However, if Vq corresponds to sufficiently fine 
resolution compared to the resolution of the input signal /(£), than sampling f(t) 
will be sufficient. This follows because (p(t) is a lowpass filter with an integral equal 
to 1. If f(t) is smooth and ip(t) is sufficiently short-lived, then we have 

(<p(t-n),f(t)) x f(n). 

Of course, if Vq is not fine enough, one can start with V- m for m sufficiently large 
so that 

(2f^(2 m t-n),/(i)) ~ 2-™/ 2 /(2- m n). 
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Figure 4.25 Computation of the wavelet series coefficients. Starting with the 
coefficients /^[n] = (tp(t — n), f(t)),n G Z, we obtain the wavelet expansion 
coefficient by a filter bank algorithm. 



If f(t) has some regularity (for example, it is continuous), there will be a resolution 
at which sampling is a good enough approximation of the inner products needed 
to begin Mallat's algorithm. Generalizations of Mallat's algorithm, which include 
more general initial approximation problems, are derived in [261] and [296]. 

4.6 Generalizations in One Dimension 

In this section, we discuss some of the more common generalizations in one dimen- 
sion, most notably, the biorthogonal and recursive filter cases, as well as wavelets 
obtained from multichannel filter banks. For treatment of wavelets with rational 
dilation factors see [16] and [33]. 

4.6.1 Biorthogonal Wavelets 

Instead of orthogonal wavelet families, one can construct biorthogonal ones, that 
is, the wavelet used for the analysis is different from the one used at the synthesis 
[58]. Basically, we relax the orthogonality requirement used so far in this chapter. 
However, we still maintain the requirement that the set of functions tp mt n or ip^i 
are linearly independent and actually form a basis. In Chapter 5, this requirement 
will be relaxed, and we will work with linearly dependent sets or frames. Calling 
{ipm,n(t)} and {^m,n{t)Y the families used at synthesis and analysis respectively 
(m and n stand for dilation and shift) then, in a biorthogonal family, the following 



7 Note that here, the 



does not denote time reversal, but is used for a dual function. 
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relation is satisfied: 

(i,n(i),ii(i)) = S[m - k] 5[n - I]. (4.6.1) 

If in addition the family is complete in a given space such as L2(7Z), then any 
function of the space can be written as 

m n 
m n 

since ip and ip play dual roles. There are various ways to find such biorthogonal 
families. For example, one could construct a biorthogonal spline basis by simply 
not orthogonalizing the Battle-Lemarie wavelet. 

Another approach consists in starting with a biorthogonal filter bank and using 
the iterated filter bank method just as in the orthogonal case. Now, both the 
analysis and the synthesis filters (which are not just time-reversed versions of each 
other) have to be iterated. For example, one can use finite-length linear phase filters 
and obtain wavelets with symmetries and compact support (which is impossible in 
the orthogonal case). 

In a biorthogonal filter bank with analysis/synthesis filters Hq(z), H\{z), Gq(z), 
and Gi(z), perfect reconstruction with FIR filters means that (see (3.2.21)) 

G (z)H (z) + G (-z)H (-z) = 2 (4.6.4) 

and 

Hl ( z ) = -z 2k+1 G (-z), (4.6.5) 

d(z) = z-^^Hoi-z) (4.6.6) 

following (3.2.18), where det(H m (z)) = 2z 2k+1 (we assume noncausal analysis filters 
in this discussion). Now, given a polynomial P(z) satisfying P(z) + P{—z) = 2, 
we can factor it into P{z) = Gq(z)Hq(z) and use {Hq(z),Go(z)} as the analy- 
sis/synthesis lowpass filters of a biorthogonal perfect reconstruction filter bank (the 
highpass filters follow from (4.6.5-4.6.6). 

We can iterate such a biorthogonal filter bank on the lowpass channel and find 
equivalent iterated filter impulse responses. Note that now, analysis and synthesis 
impulse responses are not simply time-reversed versions of each other (as in the 
orthogonal case), but are typically very different (since they depend on Hq{z) and 
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Gq(z), respectively). We can define the iterated lowpass niters as 

4%) = ii^(^), 



k=0 
i-1 



Gt\z) = n c o(^). 



fc=0 



For the associated limit functions to converge, it is necessary that both Hq{z) and 
Gq{z) have a zero at z = —1 (see Proposition 4.6). Therefore, following (4.6.4), we 
have that 

Go(l) ffo(l) = fesoMJ r£h [n]) =2. 
That is, we can "normalize" the filters such that 

n n 

This is necessary for the iteration to be well-defined (there is no square normaliza- 
tion as in the orthogonal case). Define 

Hn(e juJ ) Gn(e ju) ) 

MoM = ^_i, M (u) = ^A|_i 

and the associated limit functions 



oo 

UJ 



M = II Mf 
fc=i 

oo 

*M = II M o(l9 



fc=l 

where the former is the scaling function at analysis (within time reversal) and the 
latter is the scaling function at synthesis. These two scaling functions can be very 
different, as shown in Example 4.6. 

Example 4.6 

Consider a biorthogonal filter bank with length-4 linear phase filters. This is a one-parameter 
family with analysis and synthesis lowpass filters given by (a/ ±1): 

Ha ^ = ~AT7 — {l + az + az 2 + z 3 ), 

V2(a + 1) 

Gq(z) — — — ( — l + az~ + az~ —z~'). 

V2(a - 1) 
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Figure 4.26 Iteration of a lowpass filter with impulse response c a ■ [1, a, a, 1] for 
a € [—3 ... 3]. The sixth iteration is shown. For a = 3, the iteration converges 
to a quadratic spline. Note that for a = 0, there is no convergence and a = ±1 
does not correspond to a biorthogonal filter bank. 



In Figure 4.26 we show the iteration of the filter Hq(z) for a range of values a. Looking at 
the iterated filter for a and —a, one can see that there is no solution having both a regular 
analysis and a regular synthesis filter. For example, for a — 3, the analysis filter converges 
to a quadratic spline function, while the iterated synthesis filter exhibits fractal behavior 
and no regularity. 



In order to derive the biorthogonal wavelet family, we define 



Mi(w) 



#i(e 



JU\ 



V2 ' 



Afi(w) 



Gi(e 



ju) 



V2 



(4.6.7) 



as well as (similarly to (4.4.15)) 



*(w) 



*.(?)n *.&). 



fc=2 

oo 



* n«.i 



fc=2 



(4.6.c 



Note that the regularity of the wavelet is the same as that of the scaling func- 
tion (we assume FIR filters). Except that we define scaling functions and wavelets 
as well as their duals, the construction is analogous to the orthogonal case. The 
biorthogonality relation (4.6.1) can be derived similarly to the orthogonal case (see 
Proposition 4.4), but using properties of the underlying biorthogonal filter bank 
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(b) 





(c) 



(d) 



Figure 4.27 Biorthogonal wavelet bases. The scaling function ip(t) is the hat 
function or linear spline (shown in Figure 4.12(b)). (a) Biorthogonal scaling 
function ip(t) based on a length-5 filter, (b) Biorthogonal scaling function <p'(t) 
based on a length-9 filter, (c) Wavelet ip'(t) which is piecewise linear, (d) Dual 
wavelet ip'(t). 



instead [58, 319]. As can be seen in the previous example, a difficult task in design- 
ing biorthogonal wavelets is to guarantee simultaneous regularity of the basis and 
its dual. 8 To illustrate this point further, consider the case when one of the two 
wavelet bases is piecewise linear. 



Example 4.7 Piecewise Linear Biorthogonal Wavelet Bases 

Choose Go(z) — l/2\/2 (z + 2 + z _1 ). It can be verified that the associated scaling function 
ip(t) is the triangle function or linear S-spline. Now, we have to choose Hq(z) so that (i) 



8 Regularity of both the wavelet and its dual is not necessary. Actually, they can be very different 
and still form a valid biorthogonal expansion. 
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(4.6.4) is satisfied, (ii) Hq(— 1) = 0, and (iii) (p(t) has some regularity. First, choose 
H (z) = ^(-z 2 + 2^ + 6 + 2Z- 1 - z~ 2 ) = -!=.(! + z )(l + z~ l )(-z + 4 - z" 1 ) 

which satisfies (i) and (ii) above. As for regularity, we show the iterated filter H Q (z) in 
Figure 4.27(a) leading to an approximation of <p(t). As can be seen, the dual scaling function 
is very "spiky". Instead, we can take a higher-order analysis lowpass filter, in particular 
having more zeros at z — — 1. For example, using 

H'Jz) = — !-=(l + zf(l + z' 1 ) 2 ^ 2 - 18 z + 38 - Wz' 1 + 3z~ 2 ) 
64 vV 

leads to a smoother dual scaling function tp'(t) as shown in Figure 4.27(b). The wavelet tp'(t) 
and its dual if>'(t) are shown in Figure 4.27(c) and (d). Note that both of these examples 
are simply a refactorization of the autocorrelation of the Daubechies' filters for N — 2 and 
3, respectively (see Table 4.2). 

Given the vastly different behavior of the wavelet and its dual, a natural question 
that conies to mind is which of the two decomposition formulas, (4.6.2) or (4.6.3), 
should be used. If all wavelet coefficients are used, and we are not worried about 
the speed of convergence of the wavelet series, then it does not matter. However, if 
approximations are to be used (as in image compression), then the two formulas can 
exhibit different behavior. First, zero moments of the analyzing wavelet will tend 
to reduce the number of significant wavelet coefficients (see Section 4.5.2) and thus, 
one should use the wavelet with many zeros at u = for the analysis. Since tp(uj) 
involves H\{e l ' jl ) (see (4.6.7-4.6.8)) and H\{z) is related to Gq(—z), zeros at the 
origin for ip(ui) correspond to zeros at to = n for Go(e Ju; ). Thus many zeros at z = — 1 
in Gq{z) will give the same number of zero moments for ip{u)) and contribute to a 
more compact representation of smooth signals. Second, the reconstructed signal 
is a linear combination of the synthesis wavelet and its shifts and translates. If 
not all coefficients are used in the reconstruction, a subset of wavelets should give 
a "close" approximation to the signal and in general, smooth wavelets will give a 
better approximation (for example in a perceptual sense for image compression). 
Again, smooth wavelets at the synthesis are obtained by having many zeros at 
z = —1 in Gq{z). In practice, it turns out that (4.6.2) and (4.6.3) indeed lead to 
a different behavior (for example in image compression) and usually, the schemes 
having smooth synthesis scaling function and wavelet are preferred [14]. 

This concludes our brief overview of biorthogonal wavelet constructions based 
on filter banks. For more material on this topic, please refer to [58] (which proves 
completeness of the biorthogonal basis under certain conditions on the filters), [289] 
(which discusses general properties of biorthogonal wavelet bases) and [130, 319] 
(which explores further properties of biorthogonal filter banks useful for designing 
biorthogonal wavelets). 
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4.6.2 Recursive Filter Banks and Wavelets with Exponential Decay 

In Section 3.2.5, filter banks using recursive or IIR filters were discussed. Just 
like their FIR counterparts, such filter banks can be used to generate wavelets 
by iteration [130, 133]. We will concentrate on the orthogonal case, noting that 
biorthogonal solutions are possible as well. 

We start with a valid autocorrelation P(z), that is, P{z) + P{ — z) = 2, but where 
P{z) is now a ratio of polynomials. The general form of such a P(z) is given in 
(3.2.75) and a distinctive feature is that the denominator is a function of z 2 . Given 
a valid P(z), we can take one of its spectral factors. Call this spectral factor Gq{z) 
and use it as the lowpass synthesis filter in an orthogonal recursive filter bank. The 
other filters follow as usual (see (3.2.76-3.2.77)) and we assume that there is no 
additional allpass component (this only increases complexity, but does not improve 
frequency selectivity) . 

Assume that Gq{z) has at least one zero at z = 1 and define Mq(w) = l/v2 • 
Go(e Ju; ) (thus ensuring that Mq(0) = 1). As usual, we can define the iterated filter 
Gq (z) (4.4.9) and the graphical function ip^'(t) (4.4.10). Assuming convergence 
of the graphical function, the limit will be a scaling function ip(t) just as in the 
FIR case. The two-scale equation property holds (see (4.4.20)), the only difference 
being that now, an infinite number of ip(2t — n)'s are involved. 

An interesting question arises: What are the maximally flat IIR filters, or the 
equivalent of the Daubechies' filters? This question has been studied by Herley, 
who gave the class of solutions and the associated wavelets [130, 133]. Such max- 
imally flat IIR filters lead to scaling functions and wavelets with high regularity 
and exponential decay in time domain. Because IIR filters have better frequency 
selectivity than FIR filters for a given computational complexity, it turns out that 
wavelets based on IIR filters offer better frequency selectivity as well. Interestingly, 
the most regular wavelets obtained with this construction are based on very classic 
filters, namely Butterworth filters (see Examples 2.2 and 3.6). 

Example 4.8 Wavelets based on Butterworth Filters 

The general form of the autocorrelation P{z) of a half-band digital Butterworth filter is 
given in (3.2.78). Choose N — 5 and the spectral factorization of P(z) given in (3.2.79- 
3.2.80). Then, the corresponding scaling function and wavelet (actually, an approximation 
based on the sixth iteration) are shown in Figure 4.28. These functions have better regularity 
(twice differentiable) than the corresponding Daubechies' wavelets but do not have compact 
support. 

The Daubechies' and Butterworth maximally flat filters are two extreme cases to 
solving for a minimum degree autocorrelation R(z) such that 

(1 + z) N (l + z- l ) N R{z) + (1 - z) N (l - z- l ) N R{-z) = 2 
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is satisfied. In the Daubechies' solution, R(z) has zeros only, while in the Butter- 
worth case, R(z) is all-pole. For N > 4, there are intermediate solutions where 
R(z) has both poles and zeros and these are described in [130, 133]. The regularity 
of the associated wavelets is very close to the Butterworth case and thus, better 
than the corresponding Daubechies' wavelets. 

The freedom gained by going from FIR to IIR filters allows the construction of 
orthogonal wavelets with symmetries or linear phase; a case excluded in the FIR 
or wavelet with compact support case (except for the Haar wavelet). Orthogonal 
IIR filter banks having linear phase filters were briefly discussed in Section 3.2.5. 
In particular, the example derived in (3.2.81-3.2.82) is relevant for wavelet con- 
structions. Take synthesis filters Gq(z) = A(z 2 ) + z~ l A(z~ 2 ) and G±(z) = Gq(—z) 
(similar to (3.2.81)) and A{z) as the allpass given in (3.2.82). Then 

1 (1 + z~ l )(49 - 2<Qz~ l + 198z~ 2 - 20z~ 3 + 49z~ 4 ) 

W ~ 72 (15 + 42z- 2 + 7z" 4 )(7 + 42z- 2 + 15z" 4 ) 

has linear phase and five zeros at z = — 1. It leads, through iteration, to a smooth, 
differentiable scaling function and wavelet with exponential decay (but obviously 
noncausal) . 

4.6.3 Multichannel Filter Banks and Wavelet Packets 

Consider the tree-structured filter bank case first and assume that the lowpass 
filter go[n] is regular and orthogonal to its even translates. Thus, there is a limit 
function ip(t) which satisfies a two-scale equation. However, (p(t) satisfies also two- 
scale equations with scale changes by any power of 2 (by iteration). The linear 
combination is given by the iterated filter g$ [n] : 

iW-l 

Then, we can design different "wavelet" bases based on iterated low and highpass 
filters. Let us take a simple example. Consider the following four filters, corre- 
sponding to a four-channel filter bank derived from a binary tree: 

F (z) = G (z)G (z 2 ) F 1 (z) = G (z)G 1 (z 2 ), (4.6.9) 

F 2 (z) = G^Goiz 2 ) F 3 (z) = G 1 (z)G 1 (z 2 ). (4.6.10) 

This corresponds to an orthogonal filter bank as we had seen in Section 3.3. Call 
the impulse responses fi[n]. Then, the following (p(t) is a scaling function (with 
scale change by 4): 

<p(t) = 2Y,fo[k]<p(4t-k). 
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(a) 



(b) 





requency (radis 



(c) 



(d) 



Figure 4.28 Scaling function tp(t) and wavelet ip(t) based on a half-band digital 
Butterworth filter with five zeros at w = it. (a) Scaling function (p(t). (b) 
Fourier transform magnitude $(w). (c) Wavelet ip(t). (d) Fourier transform 
magnitude ^(w). 



Note that (p(t) is just the usual scaling function from the iterated two-channel bank, 
but now written with respect to a scale change by 4 (which involves the filter fo[k]). 
The following three functions are "wavelets": 

^(«) = 2^/ i [%(4t-fc), ie {1,2,3}. 



The set {<p(t—k), ipi(t— I), ip2(t— m), ip3(t— n), } is orthonormal and 2^ipi(4Pt—li),i £ 
{1,2,3},Zj,j G Z is an orthonormal basis for L 2 (7Z) following similar arguments 
as in the classic "single" wavelet case (we have simply expanded two successive 
wavelet spaces into three spaces spanned by ipi(t),i G {1, 2, 3}). Of course, this is a 
simple variation on the normal wavelet case (note that ipi(t) is the usual wavelet). 



4.6. GENERALIZATIONS IN ONE DIMENSION 291 

With these methods and the previously discussed concept of wavelet packets in 
Section 3.3.4 it can be seen how to obtain continuous-time wavelet packets. That 
is, given any binary tree built with two-channel filter banks, we can associate a set 
of "wavelets" with the highpass and bandpass channels. These functions, together 
with appropriate scales and shifts will form orthonormal wavelet packet bases for 
L 2 (R). 

The case for general filter banks is very similar [129, 277]. Assume we have a 
size-iV filter bank with a regular lowpass filter. This filter has to be regular with 
respect to downsampling by N (rather than 2), which amounts (in a similar fashion 
to Proposition 4.7) to having a sufficient number of zeros at the Nth. roots of unity 
(the aliasing frequencies, see discussion below). The lowpass filter will lead to a 
scaling function satisfying 

rtt) = N l / 2 Y,9o[k]<p(Nt-k). 

k 
The N — 1 functions 

Mt) = N 1/2 Y, 9i[k]f(Nt -k), i = l,...,N-l, 

k 

will form a wavelet basis with respect to scale changes by N. 

Let us consider the issue of regularity for multichannel filter banks. It is clear 
that if a regular two-channel filter bank is cascaded a finite number of times in order 
to obtain wavelet packets (as was done above in (4.6.9-4.6.10)), then regularity of 
the lowpass filter is necessary and sufficient in order to obtain regular wavelet 
packets. This follows since the scaling function is the same and the wavelet packets 
are finite linear combinations of scaling functions. In the more general case of a 
filter bank with N channels, we have to test the regularity of the lowpass filter 
Gq{z) with respect to sampling rate changes by N . That is, we are interested in 
the behavior of the iterated filter Gq (z); 



k=0 

and the associated graphical function 



G ( £\z) = l\G (z Nk ), (4.6.11) 



>®(t)=N i/2 -$[*], ^<t<^- (4.6.12) 



Since the filter Gq{z) is orthogonal with respect to translation by multiples of N, 
it satisfies (see (3.4.11)) 

JV-l 

Y G {e j(u+27rk/N) ) G (e~ i(w+27rfc/7V) ) = N. (4.6.13) 

fc=0 
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A necessary condition for convergence of the graphical function is that (see Prob- 
lem 4.15) 

G (e^ +2 ^ /iV) ) = 0, jfe = l,...,JV-l, (4.6.14) 

that is, Gq(z) has at least one zero at each of the aliasing frequencies u = 2nk/N, 
k = 1, . . . ,N - 1. Then, using (4.6.14) in (4.6.13), we see that 

G (l) = VN. 
Introducing a normalized version of the lowpass filter, 



and assuming convergence, it follows that the Fourier transform of the scaling func- 
tion equals 

oo 

A sufficient condition for the convergence of the graphical function (4.6.12) to a 
continuous function can be derived very similarly to Proposition 4.7. Write 

1 + e j" J l gj(JV-l)o; \ 

^ J *M 

where K > 1 because of the necessary condition for convergence and call 

£ = sup wg[0)27r ]|i?H|. 

Then 

B < N K - X (4.6.15) 

ensures that the limit <p^'(t) as i — > oo is continuous (see Problem 4.16). 

The design of lowpass filters with a maximum number of zeros at aliasing fre- 
quencies (the equivalent of the Daubechies' filters, but for integer downsampling 
larger than 2) is given in [277]. An interesting feature of multichannel wavelet 
schemes is that now, orthogonality and compact support are possible simultane- 
ously. This follows from the fact that there exist unitary FIR filter banks having 
linear phase filters for more than two channels [321]. A detailed exploration of 
such filter banks and their use for the design of orthonormal wavelet bases with 
symmetries (for example, a four-band filter bank leading to one symmetric scaling 
function as well as one symmetric and two antisymmetric wavelets) is done in [275]. 

The problem with scale changes by A^ > 2 is that the resolution steps are even 
larger between a scale and the next coarser scale than for the typical "octave-band" 
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wavelet analysis. A finer resolution change could be obtained for rational scale 
changes between 1 and 2. In discrete time such finer steps can be achieved with 
filter banks having rational sampling rates [166]. The situation is more complicated 
in continuous time. In particular, the iterated filter bank method does not lead to 
wavelets in the same sense as for the integer-band case. Yet, orthonormal bases 
can be constructed which have a similar behavior to wavelets [33] . A direct wavelet 
construction with rational dilation factors is possible [16] but the coefficients of the 
resulting two-scale equation do not correspond to either FIR or IIR filters. 

4.7 Multidimensional Wavelets 

In Chapter 3, we have seen that, driven by applications such as image compression, 
some of the concepts from the theory of one- dimensional filter banks have been 
extended to multiple dimensions. Hence, this section can be seen as generalization 
of both Section 3.6 and the concepts introduced in this chapter. 

An easy way to construct two-dimensional wavelets, for example, is to use tensor 
products of their one-dimensional counterparts. This results, as will be seen later, 
in one scaling function and three different "mother" wavelets. Since now, scale 
change is represented by matrices, the scaling matrix in this case will be 21, that 
is, each dimension is dilated by 2. As for multidimensional filter banks, true mul- 
tidimensional treatment of wavelets offers several advantages. First, one can still 
have a diagonal dilation (scaling) matrix and yet design nonseparable (irreducible) 
scaling function and wavelets. Then, the scale change of v2, for example, is pos- 
sible, leading to one scaling function and one wavelet or a true two-dimensional 
counterpart of the well-known one- dimensional dyadic case. However, unlike for 
the filter banks, matrices used for dilation are more restricted in that one requires 
dilation in each dimension. As in one dimension, the powerful connection with fil- 
ter banks (through the method of iterated filter banks) can be exploited to design 
multidimensional wavelets. However, the task is more complicated due to incom- 
plete cascade structures and the difficulty of imposing a zero of a particular order 
at aliasing frequencies. Regularity is much harder to achieve, and up-to-date, or- 
thonormal families with arbitrarily high regularity, have not been found. In the 
biorthogonal case, transformations of one-dimensional perfect reconstruction filter 
banks into multidimensional ones can be used to design multidimensional wavelets 
by iteration. 

4.7.1 Multiresolution Analysis and Two-Scale Equation 

The axiomatic definition of a multiresolution analysis is easily generalized: The 
subspaces Vj in (4.2.1) are now subspaces of 7Z m and scaling is represented by a 
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matrix D. This matrix has to be well-behaved, that is, 

DZ t c z t 
\\i\ > 1, Vi 

The first condition requires D to have integer entries, while the second one states 
that all the eigenvalues of D must be strictly greater than 1 in order to ensure 
dilation in each dimension. For example, in the quincunx case, the matrix Dq 
from (3.B.2) 



1 


-1 


1 


-1 


1 


1 


2 


1 





1 



D Q = [ , , ). (1.7.1) 

as well as 

D Ql = 

are both valid matrices, while 

Dq 2 ■- 

is not, since it dilates only one dimension. Matrix Dq from (4.7.1) is a so-called 
"symmetry" dilation matrix, used in [163] , while Dq 1 is termed a "rotation" matrix 
used in [57]. As will be seen shortly, although both of these matrices represent 
the same lattice, they are fundamentally different when it comes to constructing 
wavelets. 

For the case obtained as a tensor product, the dilation matrix is diagonal. 
Specifically, in two dimensions, it is the matrix Dg from (3.B.1) 

n.-(l a 2 ). (4,,, 

The number of wavelets is determined by the number of cosets of DZ^, or 

|det(D)| -1 = N-l, 

where N represents the downsampling rate of the underlying filter bank. Thus, in 
the quincunx case, we have one "mother" wavelet, while in the 2x2 separable case 
(4.7.2), there are three "mother" wavelets ipi,il>2,ip3- 

The two-scale equation is obtained as in the one-dimensional case. For example, 
using Dq (we will drop the subscript when there is no risk of confusion) 



<p(t) = V2^2 9o[n] <p(Dt-n) 



n€Z 2 
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(p(h,t 2 ) = V2 ^2 9o[ni,n 2 } <p(h + t 2 -ni,h - t 2 -n 2 ). 

We have assumed that ^ 50 [ n ] — v2- 

4.7.2 Construction of Wavelets Using Iterated Filter Banks 

Since the construction is similar to the one-dimensional case, we will concentrate 
on the quincunx dilation matrices by way of example. 

Consider again Figure 4.14 with the matrix Dq replacing upsampling by 2. 
Then the equivalent low branch after i steps of filtering and sampling by Dq will 
be 

g«( W1>W2 ) = n Go ((*><>)*(£))> ( 4 - 7 - 3 ) 

where Gq (u>i,u 2 ) = 1. Observe here that instead of scalar powers, we are dealing 
with powers of matrices. Thus, for different matrices, iterated filters are going to 
exhibit vastly different behavior. Some of the most striking examples are multidi- 
mensional generalizations of the Haar basis which were independently discovered 
by Grochenig and Madych [123] and Lawton and Resnikoff [172] (see next section). 
Now, as in the one-dimensional case, construct a continuous-time "graphical" 
function based on the iterated filter g^ [ni,n 2 ]: 

<pV(h,t 2 ) = 2'/ 2 9^[ ni ,n 2 ], 

1 1 \ / t\ \ ( Til 



1 -X) \t 2 J - \n 2 , + M *["■!)■ 

Note that these regions are not in general rectangular and specifically in this case, 
they are squares in even, and diamonds (tilted squares) in odd iterations. Note that 
one of the advantages of using the matrix Dq rather than Dn l , is that it leads 
to separable sampling (diagonal matrix) in every other iteration since Dq = 21. 
The reason why this feature is useful is that one can use certain one- dimensional 
results in a separable manner in even iterations. We are again interested in the 
limiting behavior of this "graphical" function. Let us first assume that the limit of 
f^(ti,t2) exists and is in L 2 (JZ ) (we will come back later to the conditions under 
which it exists). Hence, we define the scaling function as 

<p(h,t 2 ) = lim<^(ii,£ 2 ), ip(h,t 2 ) £L 2 (K 2 ). (4.7.4) 

1— >oo 

Once the scaling function exists, the wavelet can be obtained from the two-dimensional 
counterpart of (4.2.14). Again, the coefficients used in the two-scale equation and 
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the quincunx version of (4.2.14) are the impulse response coefficients of the low- 
pass and highpass filters, respectively. To prove that the wavelet obtained in such 
a fashion actually produces an orthonormal basis for L2(7Z ), one has to demon- 
strate various facts. The proofs of the following statements are analogous to the 
one-dimensional case (see Proposition 4.4), that is, they rely on the orthogonality 
of the underlying filter banks and the two-scale equation property [163]: 

(a) {ip(Dnt — n),Lp(D r Qt — k)) = 2~ m 5[n — k], that is, the scaling function is 
orthogonal to its translates by multiples of Dq" 1 at all scales. 

(b) The same holds for the wavelet. 

(c) (ip(t),ip(t — k)), the scaling function is orthogonal to the wavelet and its integer 
translates. 

(d) Wavelets are orthogonal across scales. 
It follows that the set 

S = {2- m / 2 V>(£>~ m * -n)\meZ,neZ & ,teK & }, 

is an orthonormal set. What is left to be shown is completeness, which can be done 
similarly to the one-dimensional case (see Theorem 4.5 and [71]). 

The existence of the limit of ip"'(ti,t2) was assumed. Now we give a necessary 
condition for its existence. Similarly to the one- dimensional case, it is necessary for 
the lowpass filter of the iterated filter bank to have a zero at aliasing frequencies. 
This condition holds in general, but will be given here for the case we have been 
following throughout this section, that is, the quincunx case. The proof of necessity 
is similar to that of Proposition 4.6. 

Proposition 4.9 

If the scaling function 93(^1,^2) exists for some (£1,^2) £ 72. , then 

Y.g^Dik + H = -±=, k = f°\ fc! = fjV (4.7.5) 

or, in other words 

G (l,l) = V2, G (-1,-1)=0. 



Proof 



Following (4.7.3), one can express the equivalent filter after i steps in terms of the equivalent 
filter after (i — 1) steps as 

So°[n] = ^ffo[fe]5^" 1) [n--D i_1 fe] = S>£ _1) [fc] 9o[n - Dfc], 
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and thus 

g^[Dn] = J^MDkjgt^in-k]. 

k 

Using (4.7.4) express g^~ and g^ in terms of tff- , ~ 1 ' and <p( l > and then take the limits 
(which we are allowed to do by assumption) 

<p(Dt) = y r 2^2g [Dk]L P {Dt). (4.7.6) 

k 

Doing now the same for g^ [flu + fci] one obtains 



<P 



Dt) = ^^so[fln + J!i]^flt). (4.7.7) 



Equating (4.7.6) and (4.7.7), one obtains (4.7.5). 

Now, a single zero at aliasing frequency is in general not sufficient to ensure reg- 
ularity. Higher-order zeros have led to regular scaling functions and wavelets, but 
the precise relationship is a topic of current research. 

4.7.3 Generalization of Haar Basis to Multiple Dimensions 

The material in this section is based on the work of Lawton and Resnikoff [172], 
and Grochenig and Madych [123]. The results are stated in the form given in [123]. 

Recall the Haar basis introduced at the beginning of this chapter and recall that 
the associated scaling function is 1 over the interval [0, 1) and otherwise. In other 
words, this scaling function can be viewed as the characteristic function of the set 
Q = [0,1). Together with integer translates, the Haar scaling function "covers" 
the real line. The idea is to construct analogous multidimensional generalized Haar 
bases that would have, as scaling functions, characteristic functions of appropriate 
sets with dilation replaced by a suitable linear transformation. 

The approach in [123] consists of finding a characteristic function of a compact 
set Q that would be the scaling function for an appropriate multiresolution analysis. 
Then to find the wavelets, one would use the standard techniques. An interesting 
property of such scaling functions is that they form self-similar tilings of 7Z n . This 
is not an obvious feature for some scaling functions of exotic shapes. 

The algorithm for constructing a scaling function for multiresolution analysis 
with matrix dilation D basically states that one takes a set of points belonging to 
different cosets of the lattice and forms a discrete filter being 1 on these points. The 
filter is then iterated as explained earlier. If it converges, we obtain an example 
of a generalized Haar wavelet. For a more formal definition of the algorithm, the 
reader is referred to [123]. For example, in the quincunx case, the set of points of 
coset representatives would consist only of two elements (since the quincunx lattice 
has only two cosets) and its elements would represent the two taps of the lowpass 
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Figure 4.29 (a) The twin-dragon scaling function. The function is 1 in the 
white area and otherwise, (b) The twin-dragon wavelet. The function is 1 
in the white area, —1 in the black area, and otherwise. 



filter. Thus, the corresponding subband schemes would consist of two-tap filters. 
The algorithm, when it converges, can be interpreted as the iteration of a lowpass 
filter with only two nonzero taps (each equal to one and being in a different coset) 
which converges to the characteristic function of some compact set, just as the 
one-dimensional Haar filter converged to the indicator function of the unit interval. 
A very interesting scaling function is obtained when using the "rotation" matrix 
Dq x from (4.7.1) and points {(0, 0), (1, 0)}, that is, the lowpass filter with <?o[0, 0] = 
<7o[l>0] = 1, and otherwise. Iterating this filter leads to the "twin-dragon" case 
[190], as given in Figure 4.29. Note that (p(t) = 1 over the white region and 
otherwise. The wavelet is 1 and —1 in the white/black regions respectively, and 
otherwise. Note also how the wavelet is formed by two "scaled" scaling functions, as 
required by the two-dimensional counterpart of (4.2.9), and how this fractal shape 
tiles the space. 



4.7.4 Design of Multidimensional Wavelets 

As we have seen in Section 3.6, the design of multidimensional filter banks is not 
easy, and it becomes all the more involved by introducing the requirement that the 
lowpass filter be regular. Here, known techniques will be briefly reviewed, for more 
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Figure 4.30 The sixth iteration of the smallest regular two-dimensional filter. 

details the reader is referred to [57] and [163]. 

Direct Design To achieve perfect reconstruction in a subband system, cascade 
structures are perfect candidates (see Section 3.6), since beside perfect reconstruc- 
tion, some other properties such as orthogonality and linear phase can be easily 
imposed. 

Recall that in one dimension, a zero of a sufficiently high order at n would 
guarantee the desired degree of regularity. Unfortunately, imposing a zero of a 
particular order in multiple dimensions becomes a nontrivial problem and thus, 
algebraic solutions can be obtained only for very small size filters. 

As an example of direct design, consider again the quincunx case with matrix 
Di and the perfect reconstruction filter pair given in (3.6.7). Thus, the approach 
is to impose a zero of the highest possible order at (n, it) on the lowpass filter in 
(3.6.7), that is 



V-'Hoiuuvz) 



d l u>id k ~ l ~ 1 



UJ-2 



0, 



k 



("7r,7r) 



■-l,...,m, 
0,...,fc-l. 



Upon imposing a second-order zero the following solutions are obtained 



a 



±Vs, 



Ol 



±Vs, 



a 2 



2±V3, 



(4.7.S 



a 



±y/3, oi = 0, a 2 = 2±V3. 



(4.7.9) 



Note that the filters should be scaled by (1 - y/3)/(4y/2). The solution in (4.7.9) 
is the one-dimensional Di filter, while (4.7.8) would be the smallest "regular" two- 
dimensional filter (actually, a counterpart of D-i). Figure 4.30 shows the fourth 
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iteration of this solution. As can be seen from the plot, the function looks contin- 
uous, but not differentiable at some points. As a simple check of continuity, the 
largest first-order differences of the iterated filter can be computed (in this case, 
these differences decrease with an almost constant rate --a good indicator that 
the function is continuous [163]). Recently, a method for checking the continuity 
was developed [324]. Using this method, it was confirmed that this solution indeed 
leads to a continuous scaling function and consequently a continuous wavelet. 

This method, however, fails for larger size filters, since imposing a zero of a par- 
ticular order means solving a large system of nonlinear equations (in the orthogonal 
case). Note, however, that numerical approaches are always possible [162]. 

One to Multidimensional Transformations Another way to approach the design 
problem is to use transformations of one- dimensional filters into multidimensional 
ones in such a way that [163] 

(a) Perfect reconstruction is preserved (in order to have a valid subband coding 
system) . 

(b) Zeros at aliasing frequencies are preserved (necessary but not sufficient for 
regularity) . 

We have already discussed how to obtain perfect reconstruction in Section 3.6. Here, 
we will concern ourselves only with properties that might be of interest for designing 
wavelets. If we used the method of separable polyphase components, an advantage 
is that the zeros at aliasing frequencies carry over into multiple dimensions. As we 
pointed out in Section 3.6, the disadvantage is that only IIR solutions are possible, 
and thus we cannot obtain wavelets with compact support. In the McClellan case, 
however, wavelets with compact support are possible, but not orthonormal ones. 
For more details on these issues, see [163]. 

4.8 Local Cosine Bases 

At the beginning of this chapter (see Section 4.1.2), we examined a piecewise Fourier 
series expansion that was an orthogonal local Fourier basis. Unfortunately, because 
the basis functions were truncated complex exponentials (and thus discontinuous), 
they achieved poor frequency localization (actually, the time-bandwidth product of 
the basis functions is unbounded). Because of the Balian-Low Theorem [73], there 
are no "good" orthogonal bases in the Gabor or windowed Fourier transform case 
(see Chapter 5). However, if instead of using modulation by complex exponen- 
tials, one uses modulation by cosines, it turns out that good orthonormal bases do 
exist, as will be shown next. This result is the continuous-time equivalent of the 
modulated lapped orthogonal transforms, seen in Section 3.4. 
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Figure 4.31 Relationship among windows for the local cosine bases, (a) Rect- 
angular window. All windows are the same, (b) Smooth window satisfying the 
power complementary condition. All windows are the same, (c) General case. 



We will start with a simple case which, when refined, will lead to what Meyer 
calls "Malvar's wavelets" [193]. Note that, beside this construction, there exists 
other orthonormal bases with similar properties [61]. Thus, consider the following 
set of basis functions: 



¥j,k{t) 




Wj(t) cos 



7T ,, 1% , x 

— (ft + -)(£ - aj ) 



(4.8.1) 



for k = 0, 1, 2, ... , and j £ Z, aj is an increasing sequence of real numbers and the 
window function Wj(t) is centered around the interval [oj,Oj+i]. As can be seen, 
(4.8.1) is the continuous-time counterpart of (3.4.17) seen in the discrete-time case. 
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4.8.1 Rectangular Window 

Let us start with the simplest case and assume that 



(i-i)i. 



a j+1 - cij 



L. 



(4.8.2) 



while the "window" functions Wj(t) will be restricted as (see Figure 4.31(a)) 



Wj (t) = w(t-jL), wit) = —, 



-L<t< L. 



That is, we have rectangular windows which overlap by L with their neighbors, as 
given in Figure 4.31(a). Thus, the basis functions from (4.8.1) become 



<Pj,k(t) 



L 



cos 



> + i)(«-tf + §) 



(j -l)L<t< (j + \)L. 



To prove that this set of functions forms a basis, we have to demonstrate the 
orthogonality of the basis functions, as well as completeness. Since the proof of 
completeness is quite involved, we refer the reader to [61] for details (note that in 
[61], the proof is given for a slightly different set of basis functions, but the idea is 
the same). As for orthogonality, first note that ipjk and ipy m do not overlap for 
j — j' > 2. To prove that (pjk and ipj+\ m are mutually orthogonal, write 



( l Pj,k, l Pj 



+l,m/ 



0'+l)i 



COS 



X COS 



l(* + 5><*-'' L + I> 



l( m + I)(t_( i + l) L + ^) 



dt, (4.8.3) 



which, with change of variable x = t — (j + 1)L + L/2 becomes 



(f^kifj+i,™) 



L 



L/2 
■L/2 



sin 



L {k+ 2 )X 



cos 



7T. 1. 

-(m+-)x 



dx 



0, 



since the integrand is an odd function of x. 

Finally, orthogonality of (pj^ and (fj t m f° r k / m follows from (again with the 
change of variable x = t — jL) 



( l Pj,k,'Pj,; 



L 



cos 



^/i In/ L 

— (k + -)(x+ —) 



COS 



^ / In/ ^\ 

-(m+-)(a; + -) 



rlx 



0. 



What we have constructed effectively, is a set of basis functions obtained from the 
cosines of various frequencies, shifted in time to points jL on the time axis, and 
modulated by a square window of length 2L. 
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4.8.2 Smooth Window 

Suppose now that we still keep the regular spacing of L between shifts as in (4.8.2), 
but allow for a smooth window of length 2L satisfying the following (see Fig- 
ure 4.31(b)): 

Wj (t) = w(t-jL), w(t) = w(-t), -L<t< L, w 2 {t)+w 2 {L-t) = 1, (4.8.4) 

and the basis functions are as in (4.8.1) (see Figure 4.31(b)). Note that here, on 
top of cosines overlapping, we have to deal with the windowing of the cosines. To 
prove orthogonality, again we will have to demonstrate it only for ipj^ and ipj+i <m , 
as well as for ipj^ and (pj jm . 

By using the same change of variable as in (4.8.3), we obtain that 



(<Pj,k,<Pj+i, 



L 



L/2 , L , , L , 
™{t+-)w{t--) sin 

-L/2 l l 



7T /, lx 



cos 



TV , 1. 

j;(rn+-)t 



dt. 



Since w{t + L/2)w{t — L/2) is an even function of t, while the rest is an odd function 
of t as before, the above inner product is zero. For orthogonality of tpj^ and <Pj, m 
write 



(<Pj,k,tPj,i 



w (t) cos 



7T ,, 1 w L. 



cos 



7T , 1 w L. 

Z (m+ 2 )(t+ 2 } 



dt. 



Divide the above integral into three parts: from — L to —L/2, from —L/2 to L/2, 
and from L/2 to L. Let us concentrate on the last one. With the change of variable 
x = L — t, it becomes 



L 



L/2 



L 



w (t) cos 



7T /, 1 w L. 



cos 



j;(m+-)(t + -) 



dt 



L/2 



w (L — x) cos 



l( k + -)(-L-x) 
L y 2 A 2 7 



cos 



7T l\/3 T N 



dx. 



However, since cos[(tt/L)(A: + l/2)((3/2)L - x) = - cos[k /L(k+ I /2)(x +1/2)], we 
can merge this integral to the second one from to L/2. Using the same argument 
for the one from — L to —L/2, we finally obtain 



L 



L/2 



L/2^- 



(w 2 (t) + w 2 (L-t)) cos 



7T ,, 1 w L . 

I< fc+ 2 )( ' + 2» 



COS 



7T , 1. , L. 

- L ^+-){t+-) 



(It 



0. 



We now see why it was important for the window to satisfy the power complemen- 
tary condition given in (4.8.4), exactly as in the discrete-time case. Therefore, we 
have progressed from a rectangular window to a smooth window. 
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4.8.3 General Window 

The final step is to lift the restriction on aj imposed in (4.8.2) and allow for windows 
Wj(t) to be different. We outline the general construction [61]. The proofs of 
orthogonality will be omitted, however, since they follow similarly to the two simpler 
cases discussed above. They are left as an exercise for the reader (see Problem 
4.22). For the proof of completeness, we again refer the reader to [61] (although for 
a slightly different set of basis functions). 

Assume, thus, that we have an increasing sequence of real numbers a,j,j £ 
Z, ... dj-i < aj < Oj+i . . . We will denote by Lj the distance between a,j + \ and 
aj, Lj = Oj+i — aj. We will also assume that we are given a sequence of numbers 
rjj > such that rjj + rjj+i < Lj, j £ Z, which ensures that windows will only 
overlap with their nearest neighbor. The given windows Wj(t) will be differentiable 
(possibly infinitely) and of compact support, with the following requirements: 

(a) < Wj(t) < 1, Wj(t) = 1 if aj + rjj < t < aj + \ — rjj+i- 

(b) Wj(t) is supported within [aj — r/j,aj + i + r]j + i\. 

(c) If \t — aj\ < rjj then Wj-i(t) = Wj(2aj — t), and w|_ 1 (t) + w 2 j{t) = 1. 

This last condition ensures that the "tails" of the adjacent windows are power 
complementary. An example of such a window is taking Wj{t) = sin[(7r/2)0((£ — 
aj + r/j) / (2i]j))] for \t — aj\ < rjj, and Wj(t) = cos[(7r/2)9((t — aj + i + r]j + i)/r]j + i)] for 
\t — Oj+i| < i]j + \. Here, 9(t) is the function we used for constructing the Meyer's 
wavelet given in (4.3.1), Section 4.3.1. With these conditions, the set of functions 
as in (4.8.1) forms an orthonormal basis for L2(7Z). It helps to visualize the above 
conditions on the windows as in Figure 4.31(c). Therefore, in this most general 
case, the window can go anywhere from length 2L to length L (being a constant 
window in this latter case of height 1) and is arbitrary as long as it satisfies the 
above three conditions. 

Let us see what has been achieved. The time-domain functions are local and 
smooth and their Fourier transforms have arbitrary polynomial decay (depending 
on the smoothness or differentiability of the window). Thus, the time-bandwidth 
product is now finite (unlike in the piecewise Fourier series case), and we have a 
local modulated basis with good time-frequency localization. 

APPENDIX 4. A Proof of Theorem 4.5 



Proof 



As mentioned previously, what follows is a brief outline of the proof, for more details, refer 
to [71]. 
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(a) It can be shown that 

22\do[n — 2k]<pjk+gi[n — 2k]ipjk] = <fij-i,n- 
k 

(b) Using this, it can be shown that 

n k k 

(c) Then, by iteration, for all N £ Af 

N 

J2\iv-N, n ,f}\ 2 = £i<^v fe ,/>i 2 + E £i<vw>i a - ( 4 - a -!) 

n k j = — N k 

(d) It can be shown that 

lim J2\(<p Nk ,f)\ a = 0, 

k 

and thus the limit of (4.A.1) reduces to 

N 

lim \{ V - Nn J)\ 2 = Jim £ £)KlW>| 2 - (4.A.2) 



(e) Concentrating on the left side of (4. A. 2) 
with |i?| < C2" 3JV/2 and thus 



lim \R\ = 0, 



lim ^K^^,/)! 2 = lim 2 7 r/|<I>(2- JV a;)| 2 |FH| 2 dc l ;, 

k 

or again, substituting into (4. A. 2) 

AT 

Hm J2 £iww>i a = £iww>r\ 

j=-N k k 

= \im 2n [ \<S>(2- N Lo)\ 2 \F{Lu)\ 2 diu. 



(f) Finally, the right side of the previous equation can be shown to be 

lim 2^/|$(2- N c)| 2 |FH| 2 ^ = ||/|| 2 , 

N — ► oo / 

and 

£kv^,/>i 2 = iff, 

A: 

which completes the proof of the theorem. 
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Problems 

4.1 Consider the wavelet series expansion of continuous-time signals f(t) and assume tp(t) is 
the Haar wavelet. 

(a) Give the expansion coefficients for f(t) = 1, t € [0, 1], and otherwise (that is, the 
scaling function f{t)). 

(b) Verify that E m E„ \{ip m ,n, f}\ 2 — 1 (Parseval's identity for the wavelet series expan- 
sion). 

(c) Consider /'(£) — f(t — 2~'), where i is a positive integer. Give the range of scales 
over which expansion coefficients are different from zero. 

(d) Same as above, but now f'(t) — f(t — 1/V2). 

4.2 Consider a multiresolution analysis and the two-scale equation for tp{t) given in (4.2.8). 
Assume that {<p(t — n)} is an orthonormal basis for Vo- Prove that 

(a) ||so[n]||=l 

(b) g [n] = V2 {<p(2t-n),<p(n))- 

4.3 In a multiresolution analysis with a scaling function <p{£) satisfying orthonormality to its 
integer shifts, consider the two-scale equation (4.2.8). Assume further < |3>(0)| < oo and 
that $(u>) is continuous in ui = 0. 

(a) Show that Y, N 9o[™\ = V^- 

(b) Show that E„ 3o [2n] = E„ 9o[2n + 1]. 

4.4 Consider the Meyer wavelet derived in Section 4.3.1 and given by equation (4.3.5). Prove 
(4.3.6). Hint: in every interval [(2 fe 7r)/3, (2 fe+1 7r)/3] there are only two "tails" present. 

4.5 A simple Meyer wavelet can be obtained by choosing 8(x) in (4.3.1) as 

[0 x< 
6(x) = I x < x < 1 . 
i 1 Kx 



(a) Derive the scaling function and wavelet in this case (in Fourier domain). 

(b) Discuss the decay in time of the scaling function and wavelet, and compare it to the 
case when 0{x) given in (4.3.2) is used. 

(c) Plot (numerically) the scaling function and wavelet. 
4.6 Consider B-splines as discussed in Section 4.3.2. 

(a) Verify that (4.3.11) is the DTFT of (4.3.12). 



PROBLEMS 307 

(b) Given that /3 (2JV+1) (i) =/3 (JV) (£)* /3 (JV) (i), prove that 

' l3^ N \t)p ( - N \t-n)dt. 

o 

(This is an alternate proof of (4.3.23). 
(C) Calculate fe (2JV+1) [n] for N = 1 and 2. 

4.7 Battle- Lemarie wavelets: Calculate the Battle-Lemarie wavelet for the quadratic spline case 

(see (4.3.26-4.3.27)). 

4.8 Battle-Lemarie wavelets based on recursive filters: In the orthogonalization procedure of the 
Battle-Lemarie wavelet (Section 4.3.2), there is a division by \J ' B^ 2N+1 ~>{lo) (see (4.3.14), 
(4.3.17)). Instead of taking a square root, one can perform a spectral factorization of 
£r 2 +1 '(w) when B^ 2N+1 ' (cj) is a polynomial in e? u (for example, (4.3.16)). For the linear 
spline case (Section 4.3.2), perform a spectral factorization of B^ 2N+1 ' (uj) into 

B (2JV+1 V) = R(e^)-R(e-'") = \R(e^)\ 2 , 

and derive <&(w), <p(t) (use the fact that \jR(e 3UJ ) is a recursive filter and find the set {«„}) 
and Go{e ]ul ). Indicate also \I/(w) in this case. 

4.9 Prove that if g(t), the nonorthogonal basis for Vb, has compact support, then D(u>) in (4.3.20) 
is a trigonometric polynomial and has a stable (possibly noncausal) spectral factorization. 

4.10 Orthogonality relations of Daubechies' wavelets: Prove Relations (b) and (c) in Proposi- 
tion 4.4, namely: 

(a) {tf)(t — n), ip(t — n')) — 5[n — n'\ (where we skipped the scaling factor for simplicity) 

(b) Mi- n),V(t-n')> = 0, 

4.1 1 Infinite products and the Haar scaling function: 

(a) Consider the following infinite product: 

k 

Pk = l[a b ' \b\<l, 



and show that its limit as k 



r 1/(1-6) 

hm pk — a 



(b) In Section 4.4.1, we derived the Haar scaling function as the limit of a graphical 
function, showing that it was equal to the indicator function of the unit interval. 
Starting from the Haar lowpass filter Go(z) — (l + z~ )/v2 and its normalized version 
M (w) = Go(e iu )/V2, show that from (4.4.14), 

*(«) = flM„( W /2*) =^ /2 ^#" 
fe=i ' 

Hint: Use the identity cos(oj) — sin(2aj)/2sin(a;). 
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(c) Show, using (4.4.15), that the Haar wavelet is given by 

T / \ ■ -iW2sin 2 (W4) 

LO/4 

4.12 Consider the product 

fe=i 

where Mo(w) is 27r-periodic and satisfies Mo(0) = 1 as well as |Mo(w)| < 1, w € [— 7r,7r]. 

(a) Show that the infinite product 5>^(w) converges pointwise to a limit $(w). 

(b) Show that if Mo(w) = l/v / 2Go(e)" and Go(e)"' is the lowpass filter in an orthogo- 
nal filter bank, then |Mo(a;)| < 1 is automatically satisfied and Mo(0) — 1 implies 
M (7r)=0. 

4.13 Maximally flat Daubechies ' filters: A proof of the closed form formula for the autocorrelation 
of the Daubechies' filter (4.4.34) can be derived as follows (assume Q — 0). Rewrite (4.4.32) 
as 

Use Taylor series expansion of the first term and the fact that deg[P(y)] < N (which can 
be shown using Euclid's algorithm) to prove (4.4.34). 

4.1 4 Given the Daubechies' filters in Table 4.2 or 4.3, verify that they satisfy the regularity bound 
given in Proposition 4.7. Do they meet higher regularity as well? (you might have to use 
alternate factorizations or cascades). 

4.15 In an iV-channel filter bank, show that at least one zero at all aliasing frequencies 2nk/N, 
k = 1, . . . ,N — 1, is necessary for the iterated graphical function to converge. Hint: See the 
proof of Proposition 4.6. 

4.16 Consider a filter Go(z) whose impulse response is orthonormal with respect to shifts by N. 
Assume Go(z) as K zeros at each of the aliasing frequencies uj — 2nk/N, k — 1, . . . , N — 
1. Consider the iteration of Go(z) with respect to sampling rate change by N and the 
associated graphical function (see (4.6.11-4.6.12)). Prove that the condition given in (4.6.15) 
is sufficient to ensure a continuous limit function <p{t) — lim^oo (p- z '(t). Hint: The proof is 
similar to that of Proposition 4.7. 

4.17 Successive interpolation [131]: Given an input signal x[n], we would like to compute an 
interpolation by applying upsampling by 2 followed by filtering, and this i times. Assume 
that the interpolation filter G(z) is symmetric and has zero phase, or G(z) ~ go + giz + 
g-\z~ x + g 2 z 2 + g-2Z~ 2 + . . . 

(a) After one step, we would like j/^pn] = x[n], while y^ 1 '[2n+ 1] is interpolated. What 
conditions does that impose on G(z)7 

(b) Show that if condition (a) is fulfilled, then after i iterations, we have y ( - J - > [2 l n] = x[n] 
while other values are interpolated. 
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(c) Assume G(z) = l/2z + 1 + \j2z~ 1 . Given some input signal, sketch the output signal 
2/^'[n] for some small i. 

(d) Assume we associate a continuous-time function y (t) with j/'*'[n]: 

y {i \t) =y (i) [n], n/2* < t< (n + l)/2\ 

What can you say about the limit function y ''(t) as i goes to infinity and G(z) is as 
in example (c)? Is the limit function continuous? differentiable? 

(e) Consider G(z) to be the autocorrelation of the Daubechies' filters for N = 2 ... 6, 
that is, the P(z) given in Table 4.2. Does this satisfy condition (a)? For N — 
2 ... 6, consider the limit function y ''(t) as i goes to infinity and try to establish the 
"regularity" of these limit functions (are they continuous, differentiable, etc.?). 

4.18 Recursive subdivision schemes: Assume that a function f(t) satisfies a two-scale equation 
f(t) = X^ n c n f(2t — n). We can recursively compute f(t) at dyadic rationals with the 
following procedure. Start with f^{t) = 1, —1/2 < t < 1/2, otherwise. In particular, 
/ (0) (0) = 1 and / (0) (1) = / <0) (-l) = 0. Then, recursively compute 

/«(*) = ^c„/ (i " 1) (2i-n). 

n 

In particular, at step i, one can compute the values f (t) at t = 2~ l n,n £ Z. This will 
successively "refine" f^'(t) to approach the limit /(£), assuming it exists. 

(a) Consider this successive refinement for Co = 1 and Ci = C-i = 1/2. What is the limit 

/ w (£) asi^oo? 

(b) A similar refinement scheme can be applied to a discrete-time sequence s[n]. Create 
a function g (t) = s[n] at t = n. Then, define 

»<»( 5 £t) - »"-"(^ 

To what function g(t) does this converge in the limit of i — > oo? This scheme is 
sometimes called bilinear interpolation, explain why. 

(c) A more elaborate successive refinement scheme is based on the two-scale equation 

/(£) = /(2a;) + A[/(2a ! + l) + /(2a;-l)]-^[/(2a! + 3) + /(2a;-3)]. 

Answer parts (a) and (b) for this scheme. (Note: the limit f(x) has no simple closed 
form expression). 

4.19 Interpolation filters and functions: A filter with impulse response g[n] is called an interpo- 
lation filter with respect to upsampling by 2 if g[2n] = 5[n]. A continuous-time function 
f(t) is said to have the interpolation property if f(n) — S[n\. Examples of such functions 
are the sine and the hat function. 
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(a) Show that if g[n] is an interpolation filter and the graphical function tp '*'(£) associ- 
ated with the iterated filter </*'[n] converges pointwise, then the limit tp(t) has the 
interpolation property. 

(b) Show that if g[n] is a finite-length orthogonal lowpass filter, then the only solution 
leading to an interpolation filter is the Haar lowpass filter (or variations thereof). 

(c) Show that if <p(t) has the interpolation property and satisfies a two-scale equation 

<p(t) = X^ Cn V( 2 * - n )' 

then C21 — 5[l], that is, the sequence c n is an interpolation filter. 

4.20 Assume a continuous scaling function tp(t) with decay 0(l/t^ 1+e '), e > 0, satisfying the 
two-scale equation 

ip(t) = / y c„ ip(2t - n). 

Show that J^ n C2n = X^n c 2n+i = 1 implies that 

/(£) — y^ ip{t — n) — constant 7^ 0. 

Hint: Show that f(t) = f(2t). 

4.21 Assume a continuous and differentiable function <p(t) satisfying a two-scale equation 

ifi(t) — /,c n <p(2t - n) 

n 

where X^ n C2n ~ En C2n + 1 ~ 1- Show that ip (t) satisfies a two-scale equation and show 
this graphically in the case of the hat function (which is differentiable almost everywhere). 

4.22 Prove the orthogonality relations for the set of basis functions (4.8.1) in the most general 
setting, that is, when the windows Wj (t) satisfy conditions (a)-(c) given at the end of Section 

4.8. 



Continuous Wavelet and Short-Time Fourier 
Transforms and Frames 



"Man lives between the infinitely large 

and the infinitely small. " 

- Blaise Pascal, Thoughts 



An this chapter, we consider expansions of continuous-time functions in terms of 
two variables, such as shift and scale for the wavelet transform, or shift and fre- 
quency for the short-time Fourier transform. That is, a one-variable function is 
mapped into a two-variable function. This representation is redundant but has 
interesting features which will be studied here. Because of the redundancy, the 
parameters of the expansion can be discretized, leading to overcomplete series ex- 
pansions called frames. 

Recall Section 2.6.4, where we have seen that one could define the continuous 
wavelet transform of a function as an inner product between shifted and scaled 
versions of a single function — the mother wavelet, and the function itself. The 
mother wavelet we chose was not arbitrary, rather it satisfied a zero-mean condition. 
This condition follows from the "admissibility condition" on the mother wavelet, 
which will be discussed in the next section. At the same time, we saw that the 
resulting transform depended on two parameters -- shift and scale, leading to a 
representation we denote, for a function /(£), by CWTf(a,b) where a stands for 
scale and b for shift. Since these two parameters continuously span the real plane 
(except that scale cannot be zero), the resulting representation is highly redundant. 
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A similar situation exists in the short-time Fourier transform case (see Sec- 
tion 2.6.3). There, the function is represented in terms of shifts and modulates of 
a basic window function w(t). As for the wavelet transform, the span of the shift 
and frequency parameters leads to a redundant representation, which we denote by 
STFTf(uJ,r) where uj and r stand for frequency and shift, respectively. 

Because of the high redundancy in both CWTf(a,b) and STFTf(u;,T), it is 
possible to discretize the transform parameters and still be able to achieve recon- 
struction. In the STFT case, a rectangular grid over the (to, r) plane can be used, of 
the form (m • ujq, n ■ to), m,n G Z and with luq and to sufficiently small (ujqtq < 2ir). 

In the wavelet transform case, a hyperbolic grid is used instead (with a dyadic 
grid as a special case when scales are powers of 2). That is, the (a, b) plane is 
discretized into (zLoqSw • a ri 6o)- I n this manner, large basis functions (when a™ 
is large) are shifted in large steps, while small basis functions are shifted in small 
steps. In order for the sampling of the (a, b) plane to be sufficiently fine, ao has to 
be chosen sufficiently close to 1, and &o close to 0. 

These discretized versions of the continuous transforms are examples of frames, 
which can be seen as overcomplete series expansions (a brief review of frames is 
given in Section 5.3.2). Reconstruction formulas are possible, but depend on the 
sampling density. In general, they require different synthesis functions than analysis 
functions, except in a special case, called a tight frame. Then, the frame behaves 
just as an orthonormal basis, except that the set of functions used to expand the 
signal is redundant and thus the functions are not independent. 

An interesting question is the following: Can one discretize the parameters in the 
discussed continuous transforms such that the corresponding set of functions is an 
orthonormal basis? From Chapter 4, we know that this can be done for the wavelet 
case, with ao = 2, bo = 1, and an appropriate wavelet (which is a constrained 
function). For the STFT, the answer is less obvious and will be investigated in 
this chapter. However, as a rule, we can already hint at the fact that when the 
sampling is highly redundant (or, the set of functions is highly overcomplete), we 
have great freedom in choosing the prototype function. At the other extreme, 
when the sampling becomes critical, that is, little or no redundancy exists between 
various functions used in the expansion, then possible prototype functions become 
very constrained. 

Historically, the first instance of a signal representation based on a localized 
Fourier transform is the Gabor transform [102], where complex sinusoids are win- 
dowed with a Gaussian window. It is also called a short-time Fourier transform and 
has been used extensively in speech processing [8, 226]. A continuous wavelet trans- 
form was first proposed by Morlet [119, 125], using a modulated Gaussian as the 
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wavelet (called the Morlet wavelet). Morlet also proposed the inversion formula. 1 
The discretization of the continuous transforms is related to the theory of frames, 
which has been studied in nonharmonic Fourier analysis [89]. Frames of wavelets 
and short-time Fourier transforms have been studied by Daubechies [72] and an ex- 
cellent treatment can be found in her book [73] as well, to which we refer for more 
details. A text that discusses both the continuous wavelet and short-time Fourier 
transforms is [108]. Several papers discuss these topics as well [10, 60, 99, 293]. 

Further discussions and possible applications of the continuous wavelet trans- 
form can be found in the work of Mallat and coworkers [182, 183, 184] for singularity 
detection, and in [36, 78, 253, 266] for multiscale signal analysis. Representations 
involving both scale and modulation are discussed in [185, 291]. Additional material 
can also be found in edited volumes on wavelets [51, 65, 251]. 

The outline of the chapter is as follows: The case of continuous transform 
variables is discussed in the first two sections. In Section 5.1 various properties 
of the continuous wavelet transform are derived. In particular, the "zooming" 
property, which allows one to characterize signals locally, is described. Comparisons 
are made with the STFT, which is presented in Section 5.2. Frames of wavelets and 
of the STFT are treated in Section 5.3. Tight frames are discussed, as well as the 
interplay of redundancy and freedom in the choice of the prototype basis function. 

5.1 Continuous Wavelet Transform 
5.1.1 Analysis and Synthesis 

Although the definition of the wavelet transform was briefly introduced in Sec- 
tion 2.6.4, we repeat it here for completeness. Consider the family of functions 
obtained by shifting and scaling a "mother wavelet" tp(t) £ L2(JZ), 

i> afi {t) = jl^(— y (5.1.1) 



where a, b £ 1Z (a / 0), and the normalization ensures that ||^> a) &(£)|| = ||^>(t)|| (for 
now, we assume that a can be both positive and negative). In the following, we 
will assume that the wavelet satisfies the admissibility condition 

f°° |*MI 2 
Q, = / ! v 11 duj < oo, (5.1.2) 

J -co M 

where Vl/(tj) is the Fourier transform of ip(t). In practice, ^(u) will always have 
sufficient decay so that the admissibility condition reduces to the requirement that 



Morlet proposed the inversion formula based on intuition and numerical evidence. The story 
goes that when he showed it to a mathematician for verification, he was told: "This formula, being 
so simple, would be known if it were correct..." 



314 CHAPTERS 

tf(0) = (from (2.4.7-2.4.8)): 

ip(t)dt = tf(0) = 0. 

Because the Fourier transform is zero at the origin and the spectrum decays at high 
frequencies, the wavelet has a bandpass behavior. We now normalize the wavelet 
so that it has unit energy, or 

/oo -i r<yo 

m)?dt = — / i^hi 2 ^ = i. 
-oo ^ J —oo 

As a result, HV'a.&WH 2 — IIV'COII 2 = 1 ( see (5.1.1)). The continuous wavelet trans- 
form of a function f(t) 6 L2(1Z) is then defined as 



CWT f (a,b) = / r a , b (t)f(t)dt = (Wb(i), /(«))■ (5.1.3) 

J — oo 

The function f(t) can be recovered from its transform by the following reconstruc- 
tion formula, also called resolution of the identity: 

Proposition 5.1 

Given the continuous wavelet transform CWTf(a,b) of a function f(t) £ 
L2(JZ) (see (5.1.3)), the function can be recovered by: 

1_ f°° f 00 „„„ dadb 

/ i/> J — oo J —oo 



f(t) = — I I CWT f (a,b) W,6(£) -^-, (5.1.4) 



where reconstruction is in the L<i sense (that is, the L<i norm of the recon- 
struction error is zero). This states that any f(t) from L<i(JZ) can be written 
as a superposition of shifted and dilated wavelets. 



Proof 



In order to simplify the proof, we will assume that tp(t) £ L\, f(t) £ L\ n L2 as well as 
F{uj) € L\ (or f(t) is continuous) [108]. First, let us rewrite CWTf(a,b) in terms of the 
Fourier transforms of the wavelet and signal. Note that the Fourier transform of ip a ,b(t) is 

*a,6(w) = ^e- lh "m{aL0). 

According to Parseval's formula (2.4.11) given in Section 2.4.2, we get from (5.1.3) 

/oo ^ poo 

<k(t)/(t)<tt = 7T / K,bM F (u)du 
-oo 2 ^J-oo 

= y^L [°° y(auj)F(u:y b "duj. (5.1.5) 

27T J_^ 



5.1. CONTINUOUS WAVELET TRANSFORM 315 

Note that the last integral is proportional to the inverse Fourier transform of ty*(aui)F(uj) 
as a function of b. Let us now compute the integral over b in (5.1.4), which we call J(a), 



J(a) = / CWT f (a,b) i> a ,b{t)db, 
and substituting (5.1.5) 



J(a) = 1— I (/ V*(au)F(w)e"' u du 1 ip a , b {t)db 

I — /*OC /"OO 

= X^L m*(auj)F{uj) ip ab (t)e jbu, dbduj. (5.1.6) 

271- J-oo J-oo 

The second integral in the above equation equals (with substitution b — (t — b)/a) 

[°° i> a b (ty b "db = -L r v (— ^ e jb "db 

J-oo VaJ-cc V ) 

= v^e^' Z" 00 ij{b')e-^ ab ' db' = \/5e jwt tf(aw). (5.1.7) 

•/ — oo 

Therefore, substituting (5.1.7) into (5.1.6), J(a) becomes equal to 

Mi , 2 „ , ju)t 



J(a) = 7T- / ^{auj^F^e'^duj. 

271- J_oo 

We now evaluate the integral in (5.1.4) over a (the integral is multiplied by C^): 

J-oo a 2nJ_ 00 J_ 00 \a\ 

Because of the restrictions we imposed on /(£) and ij}(t), we can change the order of inte- 
gration. We evaluate (use the change of variable a' — aw) 

f l ^da = f M^ da ' = C„ (5.1.9) 

J-oo H J-oo \ a \ 

that is, this integral is independent of w, which is the key property that makes it all work. 
It follows that (5.1.8) becomes (this is actually the right side of (5.1.4) multiplied by C,/,) 

-!- f F(uj)e jut C^duj = Ci, ■ f(t), 

27T J-oo 

and thus, the inversion formula (5.1.4) is verified almost everywhere. It also becomes clear 
why the admissibility condition (5.1.2) is required (see (5.1.9)). 

If we relax the conditions on /(£) and tp(t), and require only that they belong to 
L-2 {TV) , then the inversion formula still holds but the proof requires some finer arguments 
[73, 108]. 
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There are possible variations on the reconstruction formula (5.1.4) if additional 
constraints are imposed on the wavelet [75] . We restrict a £ TZ + , and if the following 
modified admissibility condition is satisfied 

r°° l*Ml 2 f° l*MI 2 

Q, = / ! } /' dio = / ! ) /' du, 5.1.10) 

,/n M ./-no M 



then (5.1.4) becomes 



1 f°° f°° „„™ da db 



/(*) = yr / / CWT f (a,b)^ b (t)- , 
<~><ip Jo J-oo a 

For example, (5.1.10) is satisfied if the wavelet is real and admissible in the usual 
sense given by (5.1.2). 

A generalization of the analysis /synthesis formulas involves two different wave- 
lets; tpi(t) for analysis and tp2(t) for synthesis, respectively. If the two wavelets 
satisfy 

|*l(w)||*2(w) 



\UJ\ 



-djjj < oo, 



then the following reconstruction formula holds [73]: 



1 r°° f°° , . .. , da db 



/(«) = r / / <K»'/)^., fc — — . (5-1-11) 

where C^ 1 ^ 2 = J(^l(u))^2(^>)/\^>\)duJ- An interesting feature of (5.1.11) is that 
ipi(t) and V^W can have significantly different behavior, as we have seen with 
biorthogonal systems in Section 4.6.1. For example, ipi(t) could be compactly 
supported but not ip2(t), or one could be continuous and not the other. 

5.1.2 Properties 

The continuous wavelet transform possesses a number of properties which we will 
derive. Some are closely related to Fourier transform properties (for example, en- 
ergy conservation) while others are specific to the CWT (such as the reproducing 
kernel). Some of these properties are discussed in [124]. In the proofs we will 
assume that tp(t) is real. 

Linearity The linearity of the CWT follows immediately from the linearity of the 
inner product. 
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Figure 5.1 Shift property of the continuous wavelet transform. A shift of the 
function leads to a shift of its wavelet transform. The shading in the (a, b) 
plane indicates the region of influence. 



Shift Property If f(t) has a continuous wavelet transform given by CWTf(a,b), 
then f'(t) = f(t — b') leads to the following transform: 2 



CWT f ,(a,b) = CWT f (a,b-b'). 



This follows since 






CWT f ,(a,b) = 


1 


/■CO 


vH 


/— oo 




l 


POO 



* (V) /(< - b ' )dt 

il>[— )f{t')dt' = CWT f (a,b-b'). 



This shift invariance of the continuous transform is to be contrasted with the shift 
variance of the discrete-time wavelet series seen in Chapter 4. Figure 5.1 shows the 
shift property pictorially. 

Scaling Property If f(t) has CWTf(a,b) as its continuous wavelet transform, 
then f'(t) = (l/y / s)/(t/s) has the following transform: 

CWT r (a,b) = CWT f (-,-)■ 

This follows since 



2 In the following, /'(£) denotes the modified function (rather than the derivative) 
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(a) 




<«o- b o) 



(b) 



Figure 5.2 The scaling property, (a) Scaling by a factor of 2. (b) Two squares 
of constant energy in the wavelet-transform plane (after [238]). 



The scaling property is shown in Figure 5.2(a). We chose f'(t) such that it has 
the same energy as /(£)• Note that an elementary square in the CWT of /', with 
the upper left corner (ao,&o) and width e, corresponds to an elementary square 
in the CWT of / with the corner point (ao/s, 6o/ s ) and width e/s, as shown in 
Figure 5.2(b). That is, assuming a scaling factor greater than 1, energy contained 
in a given region of the CWT of / is spread by a factor of s in both dimensions in 
the the CWT of /'. Therefore, we have an intuitive explanation for the measure 
(da db)/a? used in the reconstruction formula (5.1.4), which weights elementary 
squares so that they contribute equal energy. 

Energy Conservation The CWT has an energy conservation property that is 
similar to Parseval's formula of the Fourier transform (2.4.12). 

Proposition 5.2 

Given f(t) £ L^iJV) and its continuous wavelet transform CWTf(a,b), the 
following holds: 



\f(t)\ 2 dt 



Proof 

From (5.1.5) we can write 

\CWT f (a,b)\ 



C,i 



OO /"OO 



tp J —OO J —OO 



\CWT f (a,b)\- 



,da db 



Or 



(5.1.12) 



da db 



-^ / ^*{aij)F{uj)e ]buJ duj 
2n 



db\^. 



Calling now P(w) = $* (aw)F(u)), we obtain that the above integral equals 

. 2 da db f°° ( f°° , 1 



\CWT t {a,b)\ A 



OO «/ — OO 



oo \«/ — oc 



2tt 



P(u)e JOuJ du;\'db 



5.1. CONTINUOUS WAVELET TRANSFORM 319 

1 f°° ,„, M 2 , \ rfa 



-y^lFHI^^, (5.1.13) 

where we have again used Parseval's formula (2.4.12). Thus, (5.1.13) becomes 

2^>^)| 2 i^Hr = ^/> (W)I T.^"- (5 - L14) 

The second integral is equal to C$ (see (5.1.9)). Applying Parseval's formula again, (5.1.14), 
and consequently (5.1.13) become 

-±- r r \CWT f (a,b)\ 2 ^ = ^r-^r l F HI^ = T |/(i)| 2 di, 

thus proving (5.1.12). 

Again, the importance of the admissibility condition (5.1.2) is evident. Also, the 
measure (da db)/a? used in the transform domain is consistent with our discussion 
of the scaling property. Scaling by s while conserving the energy will spread the 
wavelet transform by s in both the dimensions a and b, and thus a renormalization 
by 1/a 2 is necessary. 

A generalization of this energy conservation formula involves the inner product 
of two functions in time and in wavelet domains. Then, (5.1.12) becomes [73] 

r 1 r°° r°° da db 

f*(t)-g(t)dt = — / CWT* f (a,b)-CWT g (a,b)^, (5.1.15) 

J ^v J —00 J —00 o 

that is, the usual inner product of the time-domain functions equals, up to a mul- 
tiplicative constant, the inner product of their wavelet transform, but with the 
measure (da db)/a?. 

Localization Properties The continuous wavelet transform has some localization 
properties, in particular sharp time localization at high frequencies (or small scales) 
which distinguishes it from more traditional, Fourier-like transforms. 

Time Localization Consider a Dirac pulse at time to, 5(t — to) and a wavelet ip(t). 
The continuous wavelet transform of the Dirac is 

CWT 5 (a,b) = -L / Vf— V('-*o)rft = 4=^" 



a I \ a I \/a \ a 



For a given scale factor ao, that is, a horizontal line in the wavelet domain, the 
transform is equal to the scaled (and normalized) wavelet reversed in time and 
centered at the location of the Dirac. Figure 5.3(a) shows this localization for the 
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Figure 5.3 Time localization property, shown for the case of a zero-phase Haar 
wavelet, (a) Behavior of fit) = S(t — to). The cone of influence has a width of 

— 1/2 

ao/2 on each side of to and the height is a . (b) Behavior for f(t) = u(t—to), 
that is, the unit-step function. The cone of influence is as in part (a), but the 
height is — l/2o . 



compactly supported Haar wavelet (with zero phase). It is clear that for small 
a's, the transform "zooms-in" to the Dirac with a very good localization for very 
small scales. Figure 5.3(b) shows the case of a step function, which has a similar 
localization but a different magnitude behavior. Another example is given in Fig- 
ure 5.4 where the transform of a simple synthetic signal with different singularities 
is shown. 



Frequency Localization For the sake of discussion, we will consider the sine wavelet, 
that is, a perfect bandpass filter. Its magnitude spectrum is 1 for \u\ between tt 
and 2-7T. Consider a complex sinusoid of unit magnitude and at frequency u>q. The 
highest-frequency wavelet that will pass the sinusoid through, has a scale factor 
fflmin = tt/^o (and a gain of y/ff/wo) 
the sinusoid is for a max = 



while the lowest-frequency wavelet passing 

Figure 5.5(a) shows 



2tt/ujq (and a gain of \/2tt /ujq 
the various octave-band filters, and Figure 5.5(b) shows the continuous wavelet 
transform of a sinusoid using a sine wavelet. 

The frequency resolution using an octave-band filter is limited, especially at 
high frequencies. An improvement is obtained by going to narrower bandpass filters 
(third of an octave, for example). 



Characterization Of Regularity In our discussion of time localization (see Fig- 
ures 5.3 and 5.4), we saw the "zooming" property of the wavelet transform. This 
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(a) 





Figure 5.4 Continuous wavelet transform of a simple signal using the Haar 
wavelet, (a) Signal containing four singularities, (b) Continuous wavelet trans- 
form, with small scales toward the front. Note the different behavior at the 
different singularities and the good time localization at small scales. 



allows a characterization of local regularity of signals; a feature which makes the 
wavelet transform more attractive than the Fourier or local Fourier transform. In- 
deed, while global regularity of a function can be measured from the decay of its 
Fourier transform, little can be said about the local behavior. For example, a single 
discontinuity in an otherwise smooth function will produce an order l/\co\ decay of 
its Fourier transform (as an example, consider the step function). The local Fourier 
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Figure 5.5 Frequency localization of the continuous wavelet transform using 
a sine wavelet, (a) Magnitude spectrum of the wavelet and its scaled versions 
involved in the resolution of a complex sinusoid at uiq. (b) Nonzero magnitude 
of the continuous wavelet transform. 



transform is able to indicate local regularity within a window, but not more locally. 
The wavelet transform, because of the zooming property, will isolate the disconti- 
nuity from the rest of the function and the behavior of the wavelet transform in the 
neighborhood of the discontinuity will characterize it. 

Consider the wavelet transform of a Dirac impulse in Figure 5.3(a) and of a 
step function in Figure 5.3(b). In the former case, the absolute value of the wavelet 
transform behaves as |a| -1 ' 2 when approaching the Dirac. In the latter case, it is 
easy to verify, that the wavelet transform, using a Haar wavelet (with zero phase), 

1/2 

is equal to a hat function (a triangle) of height —1/2 • a and width from to — (iq/2 
to to + oo/2. Along the line a = arj, the CWT in 5.3(a) is simply the derivative 
of the CWT in 5.3(b). This follows from the fact that the CWT can be written 
as a convolution of the signal with a scaled and time-reversed wavelet. From the 
differentiation property of the convolution and from the fact that the Dirac is the 
derivative of the step function (in the sense of distributions), the result follows. In 
Figure 5.4, we saw the different behavior of the continuous wavelet transform for 
different singularities, as scale becomes small. A more thorough discussion of the 
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characterization of local regularity can be found in [73, 183] (see also Problem 5.1). 

Reproducing Kernel As indicated earlier, the CWT is a very redundant repre- 
sentation since it is a two-dimensional expansion of a one-dimensional function. 
Consider the space V of square-integrable functions over the plane (a, b) with re- 
spect to (da db)/a 2 . Obviously, only a subspace H of V corresponds to wavelet 
transforms of functions from L<i(JZ). 

Proposition 5.3 

If a function F(a, b) belongs to H, that is, it is the wavelet transform of a 
function f(t), then F(a,b) satisfies 

F(a ,b ) = — J J K(a ,b ,a,b)F(a,b)A^, (5.1.16) 

where 

K(a ,b ,a,b) = (tp ao ,b ,^a,b), 

is the reproducing kernel. 



Proof 



To prove (5.1.16), note that K(ao, bo,a,b) is the complex conjugate of the wavelet transform 
of ipa ,b at (a, b), 

K{a ,b ,a,b) = CWT; aobo {a,b), (5.1.17) 

since {tp aa ,b a ,ipa,b) — (Vvti ipa ,b )*- Since F(a,b) — CWTf(a,b) by assumption and using 
(5.1.17), the right side of (5.1.16) can be written as 

±- ^ r K(a Q ,b Q ,a,b)F(a,b)^ 
C, P J_ 00 J_ 00 a 2 

1 f°° f°° „„™. , ,s „„™ , ,,dadb 



CWT.; (a,b)-CWT f (a,b)- 

^4> J -co J -oo " 

= ii>a ,b ,f) = CWT f (a Q ,bo) = F(a ,bo), 
where (5.1.15) was used to come back to the time domain. 

Of course, since K(ao, bo, a, b) is the wavelet transform of ip a ^ at location ao, &0; it 
indicates the correlation across shifts and scales of the wavelet tp. 

We just showed that if a two-dimensional function is a continuous wavelet trans- 
form of a function, then it satisfies the reproducing kernel relation (5.1.16). It can be 
shown that the converse is true as well, that is, if a function F(a, b) satisfies (5.1.16), 
then there is a function f(t) and a wavelet ip(t) such that F(a,b) = CWTf(a,b) 
[238]. Therefore, F(a,b) is a CWT if and only if it satisfies the reproducing kernel 
relation (5.1.16). 
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Figure 5.6 Reproducing kernel of the Haar wavelet. 




(a) 




Figure 5.7 Morlet wavelet, (a) Time domain (real and imaginary parts are 
the continuous and dotted graphs, respectively), (b) Magnitude spectrum. 



An example of a reproducing kernel, that is, the wavelet transform of itself (the 
wavelet is real), is shown in Figure 5.6 for the Haar wavelet. Note that because of 
the orthogonality of the wavelet with respect to the dyadic grid, the reproducing 
kernel is zero at the dyadic grid points. 



5.1.3 Morlet Wavelet 



The classic example of a continuous-time wavelet analysis uses a windowed complex 
exponential as the prototype wavelet. This is the Morlet wavelet, as first proposed 
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in [119, 125] for signal analysis, and given by 

il>{t) = -L e -^°*e-* 2/2 , (5.1.18) 

v 2ir 



*(w) = e" 



-(w-w ) 2 /2 



The factor 1/V27T in (5.1.18) ensures that ||^>(£)|| = 1. The center frequency loq is 
usually chosen such that the second maximum of Re{tp(t)}, t > 0, is half the first 
one (at t = 0). This leads to 



-0 = 7^ A = 5.336. 

It should be noted that this wavelet is not admissible since ty(u>)\u=Q / 0, but its 
value at zero frequency is negligible (~ 7-10 ), so it does not present any problem in 
practice. The Morlet wavelet can be corrected so that ^(0) = 0, but the correction 
term is very small. Figure 5.7 shows the Morlet wavelet in time and frequency. 
The latter graph shows that the Morlet wavelet is roughly an octave-band filter. 
Displays of signal analyses using the continuous-time wavelet transform are often 
called scalograms, in contrast to spectrograms which are based on the short-time 
Fourier transform. 

5.2 Continuous Short-Time Fourier Transform 

This transform, also called windowed Fourier or Gabor transform, was briefly intro- 
duced in Section 2.6.3. The idea is that of a "localization" of the Fourier transform, 
using an appropriate window function centered around a location of interest (which 
can be moved). Thus, as the wavelet transform, it is an expansion along two param- 
eters, frequency and time shift. However, it has a different behavior because of the 
fixed window size as opposed to the scaled window used in the wavelet transform. 

5.2.1 Properties 

In the short-time Fourier transform (STFT) case, the functions used in the expan- 
sion are obtained by shifts and modulates of a basic window function w(t) 

g u ,r(t) = e^w(t-r). (5.2.1) 

This leads to an expansion of the form 

/oo 
e- ju *W*(t-T)f(t)dt = <^ )T (t), /(*))• 
-oo 
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There is no admissibility constraint on the window (unlike (5.1.2)) since it is suf- 
ficient for the window to have finite energy. It is convenient to choose the window 
such that ||u>(£)|| = 1 and we will also assume that w(t) is absolutely integrable, 
which is the case in practice. 

Similarly to the wavelet case, the function f(t) can be recovered, in the L<i sense, 
by a double integral 

-i r-oo r-oo 

fit) = — / STFT f (u,T) gu ,, T (t)du dr, (5.2.2) 

^ J — OO J —CO 



-OO J — OO 

|2 



where ||u>(£)|| = 1 was assumed (otherwise, a factor l/||u>(£)|| has to be used). 
The proof of (5.2.2) can be done by introducing 

-i /"oo pA 

fA{t) = — / STFT f (u,T)g u , T (t)du>dT 

and showing that lim^^oo /^(i) = f(t) in L^ijV) (see [108] for a detailed proof). 
There is also an energy conservation property for the STFT. 

Proposition 5.4 

Given /(£) £ L^^TV) and its short-time Fourier transform STFTf(u;,T), the 
following holds: 



-i p<yo poo 

\\f(t)\\ 2 = ^ / \STFT f (u, T )\ 2 dudT. 



Proof 

First, using Parseval's formula, let us write the STFT in Fourier domain as 

/OO -| /"OO 

gh, T (t)f(t)dt = — / G^ t {uj)F(lo) duj, (5.2.3) 

where 

Gn,r(w) = e- j( "- n)T W/(o; - Q) (5.2.4) 

and M^(oj) is the Fourier transform of w(t). Using (5.2.4) in (5.2.3), we obtain 

STFT f (Q.,T) = —e- 3nT f W*(io-Q,)F{cu)e^ T dcu 
2tt J_ 00 

= e- jnT F-\W*(w-Q)F(w)](T). 
where F~ [-](t) is the inverse Fourier transform at r. Therefore, 



2^ 



OO •/ — OO 



I /"CO / /"GO \ 

STFT f {n,r)\ 2 dndr = — / (/ |F _1 [VK*(u; - fi)F(o;)](r)| 2 dr 1 dfi 

|W*(w-n)F(w)| 2 dw) dfi 



I ; lit ,„,* , „s „/ s,2 



2tt J_„ \ 2tt ._ , 

(5.2.5) 
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where we used Parseval's relation. Interchanging the order of integration (it can be shown 
that W*(uj - tt)F(uj) is in L 2 (TZ)), (5.2.5) becomes 

l°° ±.\F( U )\ a (± J°° |W>-n)| 2 cifi)da, = ±-J°° \F(co)\ 2 du;=\\f(t)\\ 2 

where we used the fact that ||™(t)|| 2 = 1 or ||W(w)|| 2 = 2%. 

5.2.2 Examples 

Since the STFT is a local Fourier transform, any classic window that is used in 
Fourier analysis of signals is a suitable window function. A rectangular window 
will have poor frequency localization, so smoother windows are preferred. For 
example, a triangular window has a spectrum decaying in l/u> 2 and is already a 
better choice. Smoother windows have been designed for data analysis, such as the 
Hanning window [211]: 

= f [1 + cos(2^/T)]/2 t e [-T/2,T/2], 

\ otherwise. 

The classic window, originally used by Gabor, is the Gaussian window 

w (t) = [3e~ at \ a,(3>0, (5.2.6) 

where a controls the width, or spread, in time and (3 is a normalization factor. Its 
Fourier transform W{uj) is given by 



W{uj)=(3 x [^e-" 2 l Aa . 
V a 



Modulates of a Gaussian window (see (5.2.1)) are often called Gabor functions. An 
attractive feature of the Gaussian window is that it achieves the best joint time 
and frequency localization since it meets the lower bound set by the uncertainty 
principle (see Section 2.6.2). 

It is interesting to see that Gabor functions and the Morlet wavelet (see (5.1.18), 
are related, since they are both modulated Gaussian windows. That is, given a 
certain a in (5.2.6) and a certain loq in (5.1.18), we have that ip a o(t), using the 
Morlet wavelet, is (we assume zero time shift for simplicity) 

^ afi (t) = ^=e^ t ' a e- t2 / 2a \ 
\ / Z7ra 

while guo(t), using the Gabor window, is 
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that is, they are equal if a = l/v2a and u = u>Q-\/2a. Therefore, there is a fre- 
quency and a scale at which the Gabor and wavelet transforms coincide. At others, 
the analysis is different since the wavelet transform uses variable-size windows, as 
opposed to the fixed-size window of the local Fourier analysis. 

This points to a key design question in the STFT, namely the choice of the 
window size. Once the window size is chosen, all frequencies will be analyzed 
with the same time and frequency resolutions, unlike what happens in the wavelet 
transform. In particular, events cannot be resolved if they appear close to each 
other (within the window spread). 

As far as regularity of functions is concerned, one can use Fourier techniques 
which will indicate regularity estimates within a window. However, it will not be 
possible to distinguish different behaviors within a window spread. An alternative 
is to use STFT's with multiple window sizes (see [291] for such a generalized STFT). 

5.3 Frames of Wavelet and Short-Time Fourier Transforms 

In Chapter 3, we have considered discrete-time orthonormal bases as well as over- 
complete expansions. For the latter ones, we pointed out some advantages of relax- 
ing the sampling constraints: As the oversampling factor increases, we get more and 
more freedom in choosing our basis functions, that is, we can get better filters. In 
Chapter 4, orthonormal wavelet bases for continuous-time signals were discussed, 
while at the beginning of this chapter, the continuous-time wavelet and short-time 
Fourier transforms, that is, very redundant representations, were introduced. 

Our aim in this section is to review overcomplete continuous-time expansions 
called frames. They are sets of nonindependent vectors that are able to represent 
every vector in a given space and can be obtained by discretizing the continuous- 
time transforms (both wavelet and short-time Fourier transforms). We will see that 
a frame condition is necessary if we want a numerically stable reconstruction of a 
function / from a sequence of its transform coefficients (that is, ({tp m ,n, f))m,nez m 
the wavelet transform case, and {{g m ,n-, f))m,nez in the short-time Fourier transform 
case). 3 Therefore, the material in this section can be seen as the continuous-time 
counterpart of overcomplete expansions seen briefly in Section 3.5, as well as a 
"middle ground" between two extreme cases: Nonredundant orthonormal bases of 
Chapter 4 and extremely redundant continuous-time wavelet and short-time Fourier 
transforms at the beginning of this chapter. As in Chapter 3, there will be a trade- 
off between oversampling and freedom in choosing our basis functions. In the most 
extreme case, for the short-time Fourier transform frames, the Balian-Low theorem 
tells us that when critical (Nyquist) sampling is used, it will not be possible to obtain 
frames with good time and frequency resolutions (and consequently, orthonormal 



? Round brackets are used to denote sequences of coefficients. 
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short-time Fourier transform bases will not be achievable with basis functions being 
well localized in time and frequency). On the other hand, wavelet frames are less 
restricted and this is one of the reasons behind the excitement that wavelets have 
generated over the past few years. 

A fair amount of the material in this section follows Daubechies's book [73]. For 
more details and a more rigorous mathematical presentation, the reader is referred 
to [73], as well as to [26, 72] for more advanced material. 

5.3.1 Discretization of the Continuous-Time Wavelet 
and Short-Time Fourier Transforms 

As we have seen previously, the continuous-time wavelet transform employs basis 
functions given by (5.1.1) where b £ 7Z, a £ TZ + ,a / 0, and the reconstruction 
formula is based on a double integral, namely the resolution of the identity given by 
(5.1.4). However, we would like to be able to reconstruct the function from samples 
taken on a discrete grid. To that end, we choose the following discretization of 
the scaling parameter a: a = a™, with m £ Z and ao ^ 1. As for the shift b, 
consider the following: For m = 0, discretize b by taking integer multiples of a 
fixed fro (&o > 0). The step &o should be chosen in such a way that ip(t — nbo) 
will "cover" the whole time axis. Now, the step size b at scale m cannot be chosen 
independently of m, since the basis functions are rescaled. If we define the "width" 
of the function, A$(/), as in (2.6.1), then one can see that the width of i/j a m t o(t) is 
a™ times the width of ip{t), that is 

A t (V<,o(*)) = ctfMW))- 

Then, it is obvious that for ip a b(t) to "cover" the whole axis at a scale a = a™, the 
shift has to be b = nboa™. Therefore, we choose the following discretization: 

a = a™, b = nboa™, m,n^Z, -\, > oo, [/ > '■ 

The discretized family of wavelets is now 

^mAt) = % m/2 ip(a^ m t-nb ). 

As illustrated in Figure 5.8, to different values of m correspond wavelets of different 
widths: Narrow, high-frequency wavelets are translated by smaller steps in order 
to "cover" the whole axis, while wider, lower- frequency wavelets are translated by 
larger steps. For ao = 2, bo = 1, we obtain the dyadic case introduced in Chapter 4, 
for which we know that orthonormal bases exist and reconstruction from transform 
coefficients is possible. 

We would like to answer the following question: Given the sequence of transform 
coefficients (ip m ,n,f), is it possible to reconstruct / in a numerically stable way? 



330 



CHAPTER 5 




shift n 



shift n 



(b) 



Figure 5.8 By discretizing the values of dilation and shift parameters a = 
a n , b = nb^a™ , one obtains (a) the sampling grid and (b) the corresponding 
set of functions (the case ciq = 2 1 ' 2 , bo = 1, is shown). To different values of m 
correspond wavelets of different width: Shorter, high-frequency wavelets are 
translated by smaller steps, while wider, low-frequency wavelets are translated 
by larger steps. 



In the continuous-parameter case, this is answered by using the resolution of the 
identity. When the parameters are discretized, there is no equivalent formula. 
However, in what follows, it will be shown that reconstruction is indeed possible, 
that is, for certain ip and appropriate ao, &o> there exist ip m ,n such that the function 
/ can be reconstructed as follows: 



It is also intuitively clear that when ao is close to one, and bo is close to zero, 
reconstruction should be possible by using the resolution of the identity (since the 
double sum will become a close approximation to the double integral used in the 
resolution of the identity). Also, as we said earlier, we know that for some choices of 
ao and bo (such as the dyadic case and orthonormal bases in general), reconstruction 
is possible as well. What we want to explore are the cases in between. 

Let us now see what is necessary in order to have a stable reconstruction. Intu- 
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itively, the operator that maps a function /(£) into coefficients (ipm. n , f) has to be 
bounded. That is, if f(t) £ L2(TZ), then J2 mn \{^> m ,n, f)\ 2 has to be finite. Also, 
no f(t) with ll/H > should be mapped to 0. These two conditions lead to frame 
bounds which guarantee stable reconstruction. Consider the first condition. For 
any wavelet with some decay in time and frequency, having zero mean, and any 
choice for ao > 1, &o > 0, it can be shown that 

E|(VW,/>| 2 < B\\ff (5.3.1) 

m,n 

(this just states that the sequence ((ip m! n, f))m,n is in hiZ 2 ), that is, the sequence is 
square-summable [73]). On the other hand, the requirement for stable reconstruc- 
tion means that if ^2 m Ki^m,™,, f)\ 2 is small, ||/|| 2 should be small as well (that 
iS) Smn KV'm.n, /)| 2 should be "close" to ||/|| 2 ). This further means that there 
should exist a < oo such that ^2 m n \{^Pm,n, f)\ 2 < 1 implies ||/|| 2 < a. Take now 

r ' ' 1-1/2 

an arbitrary / and define / = J2 mn \{ip m ,n, f)\ 2 /• Then it is obvious that 

^mn KVVn.nj/)! 2 < 1 an d consequently, ||/|| 2 < a. This is equivalent to 

^4||/|| 2 < ^|(^,n,/>| 2 , (5.3.2) 

m,n 

for some A = 1/a. Take now / = f\ — fa. Then, (5.3.2) means also that the distance 
ll/i - /2II cannot be arbitrarily large if J2 m ,n IWWi, /1) ~ (^m,n,/2)| 2 is small, 
or, (5.3.2) is equivalent to the stability requirement. Putting (5.3.1) and (5.3.2) 
together tells us that a numerically stable reconstruction of / from its transform 
(wavelet) coefficients is possible only if 

A\\f\\ 2 < Y,\(^,n,f)\ 2 < B\\f\\ 2 . 
m,n 

If this condition is satisfied, then the family (i^m,n)m,neZ constitutes a frame. When 
A = B = 1, and |V>m,n| = 1 ; for all m,n, the family of wavelets is an orthonormal 
basis (what we will call a tight frame with a frame bound equal to 1). These notions 
will be defined in Section 5.3.2. 

Until now, we have seen how the continuous-time wavelet transform can be 
discretized and what the conditions on that discretized version are so that a nu- 
merically stable reconstruction from ({tp m ,n, f))m,n is possible. What about the 
short-time Fourier transform? As we have seen in Section 5.2, the basis functions 
are given by (5.2.1). As before, we would like to be able to reconstruct the function 
from the samples taken on a discrete grid. In the same manner as for the wavelet 
transform, it is possible to discretize the short-time Fourier transform as follows: 
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In gcj, T (t) — e 3wt w(t — r) choose u> = itiloq and r = nto, with u>q, to > fixed, 
m, n62so that 

9m,n(t) = e^ ot w(t-nt ). (5.3.3) 

Again, we would like to know whether it is possible to reconstruct a given function 
/ from its transform coefficients ((g m ,n, f))m,n in a numerically stable way and 
again, the answer is positive provided that g m ,n constitute a frame. Then, the 
reconstruction formula becomes 

/ / \9m,n ? J ) 9m,n = J = / ^ \9m,ni J ) 9m,m 
m,n m,n 

where g m) n are the vectors of the dual frame, and 

(9m,n,f) = [e-^ at w*(t-nt )f(t)dt. 



5.3.2 Reconstruction in Frames 

As we have just seen, for numerically stable reconstruction, the vectors used for the 
expansion have to constitute a frame. Therefore, in this section, we will present 
an overview of frames, as well as an algorithm to reconstruct / from its transform 
coefficients. For a more detailed and rigorous account of frames, see [72, 73]. 

Definition 5.5 

A family of functions (jj)jej in a Hilbert space Ti is called a frame if there 
exist < A < B < oo, such that, for all / in Ti, 

A \\ff < 5]|(7„/)| 2 < B\\ff, (5.3.4) 

where, A and B are called frame bounds. 

If the two frame bounds are equal, the frame is called a tight frame. In that case, 
and if ||7j|| = 1, A = B gives the "redundancy ratio", or the oversampling ratio. 
If that ratio equals to 1, we obtain the "critical" sampling case, or an orthonormal 
basis. These observations lead to the following proposition [73]: 

Proposition 5.6 

If ("fj)j£j is a tight frame, with frame bound A — 1, and if ||7j|| = 1, for all 
j G J, then the jj constitute an orthonormal basis. 

Note that the converse is just Parseval's formula. That is, an orthonormal basis is 
also a tight frame with frame bounds equal to 1. 
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Since for a tight frame ^j I (ijJ) ? = M\f\\ 2 , or, E ie j(/,7i>(7j,ff> = Mf,g), 
we can say that (at least in the weak sense [73]) 

This gives us an easy way to recover / from its transform coefficients ("fj, f) if the 
frame is tight. Note that (5.3.5) with A = 1 gives the usual reconstruction formula 
for an orthonormal basis. 

A frame, however, (even a tight frame) is not an orthonormal basis; it is a set 
of nonindependent vectors, as is shown in the following examples. 

Example 5.1 

Consider 1Z 2 and the redundant set of vectors (po — [1,0] T , ip\ = [—1/2, -\/3/2] T and <p2 — 
[—1/2, — v / 3/2] T (this overcomplete set was briefly discussed in Example 1.1 and shown in 
Figure 1.1). Creating a matrix M — [<po,¥>i,¥>2], it is easy to verify that 



§' 



and thus, any vector x £ 1Z € can be written as 



2 2 

x = -^2(ipi,x) (p,. (5.3.6) 

i=0 

Note that \\<fii\\ = 1, and thus 3/2 is the redundancy factor. Also, in (5.3.6), the dual set is 
identical to the vectors of the expansion. However, this set is not unique, because the <p,'s 
are linearly dependent. Since X^;=o V 9 ' — 0, we can choose 



and still obtain 

„ 2 



3 

»=0 



The particular choice of a — ft — leads to (pi — <fi. 4 See Problem 5.5 for a more general 
version of this example. 

Example 5.2 

Consider a two-channel filter bank, as given in Chapter 3, but this time with no downsam- 
pling (see Section 3.5.1). Obviously, the output is simply 

X(z) = [G (z)H (z) + G 1 (z)H 1 (z)]X(z). 



4 This particular choice is unique, and leads to the dual frame (which happens to be identical 
to the frame in this case). 



334 CHAPTER 5 

Suppose now that the two filters Go (z) and Gi (z) are of unit norm and satisfy 

Go(z)G (z- 1 ) + G 1 (z)G 1 (z- 1 ) = 2. 

Then, setting Hq(z) — Go(^ _1 ) and Hi(z) — Gi(z~ 1 ) we get 

X(z) = [G Q {z)G (z- 1 ) + G 1 (z)G 1 {z- 1 )} X(z) = 2-X(z). (5.3.7) 

Write this in time domain using the impulse responses go[n] and gi[n] and their translates. 
The output of the filter ho [n] = go[— n] at time k equals (go[n— k], x[n]) and thus contributes 
(go[n — k], x[n]) ■ go[ra — k] to the output at time m. A similar relation holds for gi[n — k]. 
Therefore, using these relations and (5.3.7), we can write 



x [m J = 

k = 



^2 ^2i9i[ n - k],x[n]) gi[m- k] - 2 • x[m]. 



That is, the set {gi[n — k]} ,i — 0,1, and k £ Z, forms a tight frame for h(Z) with a 
redundancy factor R — 2. The redundancy factor indicates the oversampling rate, which is 
indeed a factor of two in our two-channel, nondownsampled case. The vectors gi[n— k], k £ Z 
are not independent; indeed, there are twice as many than what would be needed to uniquely 
represent the vectors in hiZ). This redundancy, however, allows for more freedom in design 
of gi[k — n\. Moreover, the representation is now shift-invariant, unlike in the critically 
sampled case. 

What about reconstructing with frames that are not tight? Let us define the frame 
operator T from L2(7Z) to h(J) as 

(Ff)j = (7,,/)- (5.3.8) 

Since (-y^j^j constitute a frame, we know from (5.3.4) that ||r/|| 2 < 5||/|| 2 , that is, 
r is bounded, which means that it is possible to find its adjoint operator T*. Note 
first that the adjoint operator is a mapping from h(J) to L2(7Z). Then, (/, T*c) 
is an inner product over L2(JZ), while (Tf,c) is an inner product over ^(^O- The 
adjoint operator can be computed from the following relation (see (2. A. 2)) 

(/,r*c) = (Tf,c) = £>,■,/>%■■ (5.3.9) 

Exchanging the order in the inner product, we get that 

E<w>%- = I>^> = </>I>^>- ( 5 - 3 - 10 ) 

jeJ j&J jeJ 

Comparing the left side of (5.3.9) with the right side of (5.3.10), we find the adjoint 
operator as 

r* c = Y. c ^r ( 5 - 3 - n ) 
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From this it follows that: 

X>i,/bj = r*r/. (5.3.12) 

3 

Using this adjoint operator, we can express condition (5.3.4) as (I is the identity 
operator) 

A- 1 < T*r < B-I, (5.3.13) 

from where it follows that T*r is invertible (see Lemma 3.2.2 in [73]). Applying 
this inverse (r*T) _1 to the family of vectors jj, leads to another family jj which 
also constitutes a frame. The vectors jj are given by 

% = (rT)" 1 ^. (5.3.14) 

This new family of vectors is called a dual frame and it satisfies 
B-'Wff < £|<W>I 2 < A-'Wff, 

and the reconstruction formula becomes 

= (TT)- 1 X;<7i,/>7j 

3&J 

= (rT) _1 r*r/ 
= /, 

where we have used (5.3.14), (5.3.8) and (5.3.11). Therefore, one can write 

5>i,/>7j = / = £<W>7j- ( 5 - 3 - 15 ) 

j£j 3&J 

The above relation shows how to obtain a reconstruction formula for / from (7,-, /), 
where the only thing one has to compute is jj = (r*r) _1 7y, given by 

77 = t^r TV " -r^r*r) fc 7,. (5.3.16) 

13 A + B ^ v A + B ' ,J v ; 

k=0 

We now sketch a proof of this relation (see [73]) for a rigorous development). 
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Proof 

If frame bounds A and B are close, that is, if 

V -!-.«: 1, 

then (5.3.13) implies that T*Y is close to ((A + B)/2)I, or (r*r) _1 is close to (2/(A + B))I . 
This further means that the function / can be written as follows: 



where R is given by (use (5.3.12)) 



r = i- -. 2 r*r. (5.3.17) 

A + B y ' 

Using (5.3.13) we obtain 

-*-±I < R < *^4i, 

B+A ~ ~ B+A 

and as a result, 

\\R\\ < ^~ - = — ^— < 1. (5.3.18) 

11 " - B + A 2 + V ~ v ; 

From (5.3.17) and using (5.3.18), (r*r) _1 can be written as (see also (2.A.1)) 

fc = 

implying that 

7, = (TT)- 1 ^ = — i— YR k l3 = —?— Y(l--2— r*r)V (5.3.19) 
13 \ > i] A + B £-> " A + B -^ A + B ' u ' 

k=0 k=0 

Note that if B/A is close to one, that is, if V is small, then R is close to zero and 
convergence in (5.3.19) is fast. If the frame is tight, that is, A = B, and moreover, 
if it is an orthonormal basis, that is, A = 1, then R = I and jj = "fj. 

We have seen, for example, in the wavelet transform case, that to have a numer- 
ically stable reconstruction, we require that (V>m,n) constitute a frame. If (ip m ,n) do 
constitute a frame, we found an algorithm to reconstruct / from (/, ip mn ), given 
by (5.3.15) with jj as in (5.3.16). For this algorithm to work, we have to obtain 
estimates of frame bounds. 
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5.3.3 Frames of Wavelets and STFT 

In the last section, we dealt with abstract issues regarding frames and the recon- 
struction issue. Here, we will discuss some particularities of frames of wavelets and 
short-time Fourier transform. The main point of this section will be that for wavelet 
frames, there are no really strong constraints on ip(t),ao,bo. On the other hand, 
for the short-time Fourier transform, the situation is more complicated and having 
good frames will be possible only for certain choices of uiq and tq. Moreover, if we 
want to avoid redundancy and critically sample the short-time Fourier transform, 
we will have to give up either good time or good frequency resolution. This is the 
content of the Balian-Low theorem, given later in this section. 

In all the cases mentioned above, we need to have some estimates of the frame 
bounds in order to compute the dual frame. Therefore, we start with wavelet 
frames and show that a family of wavelets being a frame imposes the admissibility 
condition for the "mother" wavelet. We give the result here without proof (for a 
proof, refer to [73]). 

Proposition 5.7 

If the t^ m ,n(t) — &o V'( a o "^ ~~ n ^o)> m,n G Z constitute a frame for L 2 (JZ) 
with frame bounds A, B, then 



and 



Compare these expressions with the admissibility condition given in (5.1.2). It is 
obvious that the fact that the wavelets form a frame, automatically imposes the 
admissibility condition on the "mother" wavelet. This proposition will also help us 
find frame bounds in the case when the frame is tight (A = B), since then 

2 9^ /-0 l,W,.,M2 



6 lna ^ ^ 

2vr 


Jo v 27r 


(5.3.20) 


6 lna 1 ^ 
2tt 


J-oc I w I 2vr 


(5.3.21) 



A = n / du = — / | du. 

b lnaoJo lo bolnaoJ-oo \u\ 

Moreover, in the orthonormal case (we use the dyadic case as an example, A = B 
1, b = 1, a = 2) 

|^H| 2 , f° \<S>(lo)\ 2 , In 2 

-OW = / — ; ; — (1UJ ' 



u y_oo | w | 2vr 

We mentioned previously that in order to have wavelet frames, we need not impose 
really strong conditions on the wavelet, and the scaling and shift factors. In other 
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Figure 5.9 The Mexican-hat function ip(i) = (2/3 1/2 ) tt" 1 / 4 (1 -i 2 )e"' 2 / 2 . The 
rotated ip(t) gives rise to a Mexican hat — thus the name for the function. 



words, if tp(t) is at all a "reasonable" function (it has some decay in time and 
frequency, and / ip{t)dt = 0) then there exists a whole arsenal of ao and bo, such 
that {tpm,n} constitute a frame. This can be formalized, and we refer to [73] for 
more details (Proposition 3.3.2, in particular). In [73], explicit estimates for frame 
bounds A, B, as well as possible choices for ip, ao> bo, are given. 

Example 5.3 

As an example to the previous discussion, consider the so-called Mexican-hat function 



m 



v^ 



-1/4 



(l-t 2 )e 



-t z /2 



given in Figure 5.9. Table 5.1 gives a few values for frame bounds A, B with ao — 2 and 
varying bo- Note, for example, how for certain values of bo, the frame is almost tight — a 
so-called "snug" frame. The advantage of working with such a frame is that we can use just 
the Oth-order term in the reconstruction formula (5.3.16) and still get a good approximation 
of /. Another interesting point is that when the frame is almost tight, the frame bounds 
(which are close) are inversely proportional to bo- Since the frame bounds in this case 
measure redundancy of the frame, when bo is halved (twice as many points on the grid), 
the frame bounds should double (redundancy increases by two since we have twice as many 
functions). Note also how for the value of bo — 1.50, the ratio B/A increases suddenly. 
Actually, for larger values of bo, the set {ip m ,n} is not even a frame any more, since A is not 
strictly positive anymore. 



Finally, let us say a few words on time- frequency localization properties of wave- 
let frames. Recall that one of the reasons we opted for the wavelet- type signal 
expansions is because they allegedly provide good localization in both time and 
frequency. Let us here, for the sake of discussion, assume that |^| and |^| are 
symmetric, ip is centered around t = 0, and ^ is centered around u = loq (this 
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Table 5.1 Frame bounds for Mexican- 
hat wavelet frames with ao = 2 (from 

[73]). 
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bo 


A 


B 


B/A 


0.25 


13.091 


14.183 


1.083 


0.50 


6.546 


7.092 


1.083 


0.75 


4.364 


4.728 


1.083 


1.00 


3.223 


3.596 


1.116 


1.25 


2.001 


3.454 


1.726 


1.50 


0.325 


4.221 


12.986 



implies that Vm,n will be centered around t = a, 
quency). This means that the inner product (ip. 



content" of / near t 



l nbQ and near u>± = ±a c 



nbo and around ±aQ m uio in fre- 
tnif) represents the "information 
~ m uiQ. If the function / is localized 



(most of its energy lies within \t\ < T and £Iq < \uj\ < Q±) then only the coeffi- 
cients (tpm t n, f) for which (t,u>) = (a™nbo,±aQ m uJo) lies within (or very close) to 
[— T, T] x ([— fii, — Qq] U [Clo, $li]) will be necessary for / to be reconstructed up to 
a good approximation. This approximation property is detailed in [73] (Theorem 
3.5.1, in particular). 

Let us now shift our attention to the short-time Fourier transform frames. As 
mentioned before, we need to be able to say something about the frame bounds in 
order to compute the dual frame. Then, in a similar fashion to Proposition 5.7, 
one can obtain a very interesting result, which states that if g m ,n{t) (as in (5.3.3)) 
constitute a frame for L2(7Z) with frame bounds A and B, then 



A < 



LOQto 



\9\ 



< B. 



(5.3.22) 



Note how in this case, any tight frame will have a frame bound A = (2ir) / (uoto) 
(with 1 1 g 1 1 = 1). In particular, an orthonormal basis will require the following to be 
true: 

Beware, however, that ujqIq = 2tt will not imply an orthonormal basis; it just 
states that we have "critically" sampled our short-time Fourier transform. 5 Note 
that in (5.3.22) g does not appear (except \\g\\ which can always be normalized to 1), 
as opposed to (5.3.20), (5.3.21). This is similar to the absence of an admissibility 
condition for the continuous-time short-time Fourier transform (see Section 5.2). 
On the other hand, we see that luq, to cannot be arbitrarily chosen. In fact, there 



^In signal processing terms, this corresponds to the Nyquist rate. 
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Do 
i 



no frames for []q?q > 2[] 



-Do'o = 2Q 

frames possible, but with bad 
time-frequency localization 



good, tight frames 
possible for 

Do'o < 2D 



Figure 5.10 Short-time Fourier transform case: no frames are possible for 
ujQto > 2tt. There exist frames with bad time-frequency localization for cooto = 
2ir. Frames (even tight frames) with excellent time-frequency localization are 
possible for uJoto < 2ir (after [73]). 



are no short-time Fourier transform frames for UJoto > 27r. Even more is true: In 
order to have good time-frequency localization, we require that ujqIq < 2tt. The 
last remaining case, that of critical sampling, wo^o — 27r, is very interesting. Unlike 
for the wavelet frames, it turns out that no critically sampled short-time Fourier 
transform frames are possible with good time and frequency localization. Actually, 
the following theorem states just that. 



THEOREM 5.8 (Balian-Low) 



If the g m > n {t) 



j2irmt 



w(t 



n) 



m,n G Z constitute a frame for L-i(jV), then 



oo. 



either / t 2 \w(t)\ 2 dt = oo or f v 2 \W(v)\ 2 du 

For a proof, see [73]. Note that in the statement of the theorem, to = 1, luq = 
2ir/to = 2tt. Thus, in this case (woirj = 271"), we will necessarily have bad localiza- 
tion either in time or in frequency (or possibly both). This theorem has profound 
consequences, since it also implies that no good short-time Fourier transform or- 
thonormal bases (good meaning with good time and frequency localization) are 
achievable (since orthonormal bases are necessarily critically sampled). This is 
similar to the discrete-time result we have seen in Chapter 3, Theorem 3.17. The 
previous discussion is pictorially represented in Figure 5.10 (after [73]). 

A few more remarks about the short-time Fourier transform: First, as in the 
wavelet case, it is possible to obtain estimates of the frame bounds A, B. Unlike 
the wavelet case, however, the dual frame is always generated by a single function 
w. To see that, first introduce the shift operator Tw(t) = w(t — to) and the operator 
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Table 5.2 Frame bounds for the Gaus- 
sian and ujq = to — (27TA) 1 ' 2 , for A = 
0.25,0.375,0.5,0.75,0.95 (from [73]). 



A 


A 


B 


B/A 


0.250 


3.899 


4.101 


1.052 


0.375 


2.500 


2.833 


1.133 


0.500 


1.575 


2.425 


1.539 


0.750 


0.582 


2.089 


3.592 


0.950 


0.092 


2.021 


22.004 



Ew(t) = e^ ot w(t). Then, g m ,nit) can be expressed as 

9m,n(t) = e^ ot w(t-nt ) = E m T n w(t). 

One can easily check that both T and E commute with T*T and thus with (TT) -1 
as well [225]. Then, the dual frame can be found from (5.3.14) 



dua% m , n )(£) 



= (rrrVnW 

= (T*T)- 1 E m T n w{t) 

= E rn T n (T*T)- 1 w(t) 

= E m T n w(t), 

— 9m,n\J-). 



(5.3.23) 



To conclude this section, we will consider an example from [73], the Gaussian 
window, where it can be shown how, as oversampling approaches critical sampling, 
the dual frame starts to "misbehave." 



Example 5.4 (after [73]) 

Consider a Gaussian window 



u\ - 1 / 4 -t 2 /2 

w(t) — n e 



and a special case when loq — to — VA 2-k, or cuoto — 27rA (note that 1/A gives the oversam- 
pling factor). Let us try to find the dual frame. From (5.3.3), recall that (with the Gaussian 
window) 



9m,n{t) = 



jmuj t 



w(t — nta) 



Also, since g m ,n(t) are generated from a single function w(t) (see (5.3.23)), we will fix 
m — n = and find only w(t) from <?o,o(t) = w(t). Then we use (5.3.16) and write 



w(t) 



tb»'- 



1 rT)^(t). 



A + B 



(5.3.24) 
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We will use the frame bounds already computed in [73]. Table 5.2 shows these frame bounds 
for A = 0.25, 0.375, 0.5, 0.75, 0.95, or corresponding to = 1.25, 1.53, 1.77, 2.17, 2.44. Each 
of these was taken from Table 3.3 in [73] (we took the nearest computed value). Our first 
step is to evaluate r*r™. From (5.3.12) we know that 

T*r™ = y^y^(gm,n,m)gm,n- 
m n 

Due to the fast decay of functions, one computes only 10 terms on both sides (yielding a 
total of 21 terms in the summation for m and as many for n). Note that for computational 
purposes, one has to separate the computations of the real and the imaginary parts. The 
iteration is obtained as follows: We start by setting w(t) = Wo(t) = w(t). Then for each i, 
we compute 

Wi (t) = m _ 1 (t)--^-^r*rw t -i(t), 

w(t) = w(t)+Wi(t). 

Since the functions decay fast, only 20 iterations were needed in (5.3.24). Figure 5.11 shows 
plots of w with A = 0.25, 0.375, 0.5, 0.75, 0.95, 1. Note how w becomes less and less smooth 
as A increases (oversampling decreases). Even so, for all A < 1, these dual frames have good 
time-frequency localization. On the other hand, for A = 1, w is not even square-integrable 
any more and becomes one of the pathological, Baastians' functions [18]. Since in this case 
A — 0, the dual frame function w has to be computed differently. It is given by [225] 

w B (t) = n 7/4 K~ 3/2 e t2/2 J2 (-l)"e-" (n+0 - 5)2 , 

n>|t/v / 27r|-0.5 

with K « 1.854075. 



5.3.4 Remarks 

This section dealt with overcomplete expansions called frames. Obtained by dis- 
cretizing the continuous-time wavelet transform as well as the short-time Fourier 
transform, they are used to obtain a numerically stable reconstruction of a function 
/ from a sequence of its transform coefficients. We have seen that the conditions 
on wavelet frames are fairly relaxed, while the short-time Fourier transform frames 
suffer from a serious drawback given in the Balian-Low theorem: When critical 
sampling is used, it will not be possible to obtain frames with good time and fre- 
quency resolutions. As a result, orthonormal short-time Fourier transform bases 
are not achievable with basis functions being well localized in time and frequency. 
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(a) 



(b) 





(c) 



(d) 





(c) 



(0 



Figure 5.11 The dual frame functions w for luq = t$ = {2-kX) 1 ^ 2 and (a) 
A = 0.25, (b) A = 0.375, (c) A = 0.5, (d) A = 0.75, (e) A = 0.95, (f) A = 1.0. 
Note how w starts to "misbehave" as A increases (oversampling decreases). In 
fact, for A = 1, w is not even square-integrable any more (after [73]). 
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Problems 

5.1 Characterization of local regularity: In Section 5.1.2, we have seen how the continuous wave- 
let transform can characterize the local regularity of a function. Take the Haar wavelet for 
simplicity. 

(a) Consider the function 

f{t)= f t 0<t, 



t < 0, 
and show, using arguments similar to the ones used in the text, that 

CWT f {a,b) ~ a 3/2 , 

around 6 = and for small a. 

(b) Show that if 

_ f t n 0<t, n = 0,l,2... 
JW ~ \ t<0, 

then 

C¥T,(o,6) ~ a (2n+1)/2 , 

around 6 = and for small a. 
5.2 Consider the Haar wavelet 

C 1 0<i<l/2, 
ip(t) = < -1 1/2 < t< 1, 

[ otherwise. 

(a) Give the expression and the graph of its autocorrelation function a(t), 

a(t) = / tf>(r)ip(T- t)dr. 



(b) Is a(t) continuous? Derivable? What is the decay of the Fourier transform A(u>) as 

w — > ±00? 

5.3 Nondownsampled filter bank: Refer to Figure 3.1 without downsamplers. 

(a) Choose {Hq(z), H\(z), Go(z), Gi(z)} as in an orthogonal two-channel hlter bank. 
What is y[n] as a function of x[n]7 Note: Go(z) = Hq(z~ 1 ) and Gi(z) = _ffi(2: _1 ), 
and assume FIR filters. 

(b) Given the "energy" of x[n], or ||a;|| , what can you say about ||xo|| + ||^i|| ? Give 
either an exact expression, or bounds. 

(c) Assume Hq(z) and Go(z) are given, how can you find H\(z), Gi(z) such that y[n] — 
x[n]? Calculate the example where 

H (z) = Goiz' 1 ) = l+2^- 1 +z" 2 . 

Is the solution (H\(z), Gi(z)) unique? If not, what are the degrees of freedom? Note: 
In general, y[n] — x[n — k] would be sufficient, but we concentrate on the zero-delay 
case. 
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5.4 Continuous wavelet transform: Consider a continuous wavelet transform 



CWT f (a,b) = J°° ±^ (t-±V f(t)dt 



using a Haar wavelet centered at the origin 



C 1 -\<t<Q, 

m = I -i o<i<i, 

! otherwise. 



(a) Consider the signal f(t) given by 

fit) — I 2 2 ' 

[0 otherwise. 

(i) Evaluate CWT f (a, b) for a = 1, 1/2, 2 and all shifts (6 £ ft). 

(ii) Sketch CWTf(a,b) for all a (a > 0) and b, and indicate special behavior, if any 

(for example, regions where CWTf(a,b) is zero, behavior as a — » 0, anything else of 

interest). 

(b) Consider the case fit) — ip(t) and sketch the behavior of CWTf(a, b), similarly to (ii) 
above. 

5.5 Consider Example 5.1, and choose N vectors <pi (N odd) for an expansion of 1Z € , where ifi 
is given by 

Lp % = [cos(27ri/JV),sin(27rVA0] T i = 0...JV-1. 

Show that the set {<p} constitutes a tight frame for 1Z € , and give the redundancy factor. 

5.6 Show that the set {sinc(t — i/N)},i £ Z and N £ M, where 

sin(-7rf) 



sinc(i) 



Tit 



forms a tight frame for the space of bandlimited signals (whose Fourier transforms are zero 
outside (— 7r,7r). Give the frame bounds and redundancy factor. 

5.7 Consider a real m x n matrix M with m > n, rank(m) = n and bounded entries. 

(a) Show, given any x e 1Z n , that there exist real constants A and B such that 

< A\\x\\ < \\Mx\\ < B\\x\\ < oo. 

(b) Show that M T M is always invertible, and that a possible left inverse of M is given 
by 

M = (m t m) M t . 

(c) Characterize all other left inverses of M . 

(d) Prove that P = MM calculates the orthogonal projection of any vector y £ lZ m 
onto the range of M . 
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Algorithms and Complexity 



". . . divide each difficulty at hand into as many 
pieces as possible and as could be required to 

better solve them. " 
- Rene Descartes, Discourse on the Method 



_l_ he theme of this chapter is "divide and conquer." It is the algorithmic counter- 
part of the multiresolution approximations seen for signal expansions in Chapters 
3 and 4. The idea is simple: To solve a large-size problem, find smaller-size sub- 
problems that are easy to solve and combine them efficiently to get the complete 
solution. Then, apply the division again to the subproblems and stop only when 
the subproblems are trivial. 

What we just said in words, is the key to the fast Fourier transform (FFT) algo- 
rithm, discussed in Section 6.1. Other computational tasks such as fast convolution 
algorithms, have similar solutions. 

The reason we are concerned with computational complexity is that the number 
of arithmetic operations is often what makes the difference between an impractical 
and a useful algorithm. While considerations other than just the raw numbers 
of multiplications and additions play an important role as well (such as memory 
accesses or communication costs), arithmetic or computational complexity is well 
studied for signal processing algorithms, and we will stay with this point of view in 
what follows. We will always assume discrete-time data and be mostly concerned 
with exact rather than approximate algorithms (that is, algorithms that compute 
the exact result in exact arithmetic). 
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First, we will review classic digital signal processing algorithms, such as fast 
convolutions and fast Fourier transforms. Next, we discuss algorithms for multirate 
signal processing, since these are central for filter banks and discrete-time wavelet 
series or transforms. Then, algorithms for wavelet series computations are consid- 
ered, including methods for the efficient evaluation of iterated filters. Even if the 
continuous wavelet transform cannot be evaluated exactly on a digital computer, 
approximations are possible, and we study their complexity. We conclude with some 
special topics, including FFT-based overlap-add/save fast convolution algorithms 
seen as filter banks. 

6.1 Classic Results 

We briefly review the computational complexity of some basic discrete-time signal 
processing algorithms. For more details, we refer to [32, 40, 209, 334]. 

6.1.1 Fast Convolution 

Using transform techniques, the convolution of two sequences 

c[n] = 5^o[fc] b[n-k], (6.1.1) 

k 

reduces to the product of their transforms. If the sequences are of finite length, 
convolution becomes a polynomial product in transform domain. Taking the z- 
transform of (6.1.1) and replacing z _1 by x, we obtain 

C[x) = A(x)-B(x). (6.1.2) 

Thus, any efficient polynomial product algorithm is also an efficient convolution 
algorithm. 

Cook-Toom Algorithm If A{x) and B{x) are of degree M and TV respectively, 

then C{x) is of degree M + N and has in general M + N + 1 nonzero coefficients. 
We are going to use the Lagrange interpolation theorem [32] , stating that if we are 
given a set of M + N + 1 distinct points on, i = 0, . . . , M + N, then there exists 
exactly one polynomial C{x) of degree M + N or less which has the value C{oti) 
when evaluated at «j, and is given by 



M+N 

C(x) = ]T C{cci 

i=0 






(6.1.3) 



where 

C{a t ) = A(on) ■ B(ai), i = 0,...,M + N. 
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Therefore, the Cook-Toom algorithm first evaluates A(ai), B(ai), i = 0, . . . , M+N, 
then C(aij) as in (6.1.2), and finally C(x) as in (6.1.3). Since the c^'s are arbitrary, 
one can choose them as simple integers and then the evaluation of A{a.i) and B{ai) 
can be performed with additions only (however, a very large number of these if 
M and N grow) or multiplications by integers. Similarly, the reconstruction for- 
mula (6.1.3) involves only integer multiplications up to a scale factor (the least 
common multiple of the denominators). Thus, if one distinguishes carefully multi- 
plications between real numbers (such as the coefficients of the polynomials) and 
multiplication by integers (or rationals) as interpolation points, one can evaluate 
the polynomial product in (6.1.2) with M + N + 1 multiplications only, that is, lin- 
ear complexity! While this algorithm is impractical for even medium M and TV's, 
it is useful for deriving efficient small size polynomial products, which can then be 
used in larger problems as we will see. 

Example 6.1 Product of Two Degree-2 Polynomials [32] 

Take A(x) — clq + a±x, B(x) — 6o + bix, and choose ao — 0, ai = 1, 0.2 — — 1. Then, 
according to the algorithm, we first evaluate A(on), B(cti): 

A(0) = a , ^4(1) = ao + ai, A(-l) = a - ai, 

S(0) = 60, S(l) = 60 + 61, B(-l) = 60-61, 

followed by C(aa): 

C(0) = a 6 , C(l) = (a +ai)(6 + 6i), C(-l) = (a - ai)(6 - 61). 

We then find the interpolation polynomials and call them Ii(x): 

I (x) = -(x-l)(x + l), h(x) = *£+H t h{x) = ^_lil. 
Finally, C(x) is obtained as 

C(x) = C(0)I o (x) + C(l)Ii(x)+C(-l)h(x), 
which could be compactly written as 

I ' ° 

= 1/2 

V "I 1/2 

An improvement to this would be if one notes that the highest-order coefficient (in this 
case C2) is always obtained as the product of the highest-order coefficients in polynomials 
A(x) and B(x), that is, in this case C2 = ai6i. Then, one can find a new polynomial 
T(x) — C(x) — aibiX and apply the Cook-Toom algorithm on T(x). Thus, with the choice 
ao ~ and on — — 1, we get 

/ Co \ / 1 00\/6 \ / 1 \ . 

ci = 1 -1 1 60-61 1 -1 \ ( ao 






1 ci = 1 -1 1 60-61 1 -1 • ( 6 - L4 ) 

[c 2 ) \ 1 A 61 A 1 A ai J 

The Cook-Toom algorithm is a special case of a more general class of polynomial 
product algorithms, studied systematically by Winograd [334]. 
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Winograd Short Convolution Algorithms In this algorithm, the idea is to use 
the Chinese Remainder Theorem [32, 210], which states that an integer n £ {0, . . ., 
M — 1} (where M = Yl m i and the factors m,j are pairwise coprime) is uniquely 
specified by its residues rij = n mod mi. The Chinese Remainder Theorem holds 
for polynomials as well. Thus, a possible way to evaluate (6.1.2) is to choose a 
polynomial P{x) of degree at least M + N + 1, and compute 

C(x) = C(x) mod P(x) = A(x) ■ B(x) mod P(x), 

where the first equality holds because the degree of P{x) is larger than that of C(x), 
and thus the reduction modulo P{x) does not affect C{x). Factorizing P{x) into 
its coprime factors, P{x) = Y\Pi{x), one can separately evaluate 

Ci(x) = Ai(x) ■ Bi(x) mod P^x) 

(where Ai{x) and Bi{x) are the residues with respect to Pi{x)) and reconstruct 
C(x) from its residues. Note that the Cook-Toom algorithm is a particular case of 
this algorithm when P{x) equals \\{x — q:j). The power of the algorithm is that if 
P(x) is well chosen and factorized over the rationals, then the Pi(x)'s can be simple 
and the reduction operations as well as the reconstruction does not involve much 
computational complexity. A classic example is to choose P(x) to be of the form 
x L — 1 and to factor over the rationals. The factors, called cyclotomic polynomials 
[32], have coefficients {1,0,-1} up to relatively large L's. Note that if A{x) and 
B{x) are of degree L — 1 or less and we compute 

C{x) = A{x) ■ B(x) mod (x L - 1), 

then we obtain the circular, or, cyclic convolution of the sequences a[n] and b[n]: 

L-\ 
c[n] = 2_, o\k\b[(n — k) mod L\. 

k=0 

Fourier-Domain Computation of Convolution and Interpolation at the Roots 

of Unity Choosing P(x) as x — 1 and factoring down to first-order terms leads 

to 

L-l 
L 



i = H(x-wi), 

i=0 
where Wl = e~i 2n < L . For any polynomial Q(x), it can be verified that 

Q(x) mod (x — a) = Q(a). 
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A(x) 



B(x) 



Reduction 
Modulo 



Reduction 
Modulo 

PiW 




Modulo P t (x) 



Chinese Remainder 

Theorem reconstruction 

from residues 



C(x) 



Figure 6.1 Generic fast convolution algorithms. The product C(x) = A(x) ■ 
B(x) is evaluated modulo P(x). Particular cases are the Cook-Toom algorithm 
with P{x) = Y[( x ~ a i) an d Fourier-domain computation with P(x) = Yl( x ~ 



W]^) where Wl is the Lth root of unity. 



Therefore, reducing A(x) and B(x) modulo the various factors of x 
to computing 



1 amounts 



Ai(x) 
Bi{x) 



B(Wl), 



0,...,L-1, 



which, according to (2.4.43), is simply taking the length-L discrete Fourier trans- 
form of the sequences a[n] and b[n]. Then 



d(x) = C(Wl) = A(Wl)-B(Wl), i = 0, 



L-l. 



The reconstruction is simply the inverse Fourier transform. Of course, this is the 
convolution theorem of the Fourier transform, but it is seen as a particular case of 
either Lagrange interpolation or of the Chinese Remainder theorem. 

In conclusion, we have seen three convolution algorithms and they all had the 
generic structure shown in Figure 6.1. First, there is a reduction of the two poly- 
nomials involved, then there is a product in the residue domain (which is only a 
pointwise multiplication if the reduction is modulo first degree polynomials as in 
the Fourier case) and finally, a reconstruction step concludes the algorithm. 
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6.1 .2 Fast Fourier Transform Computation 

The discrete Fourier transform of size TV computes (see (2.4.43)) 

JV-l 

X[k] = ^2x[n]-W%r k , W% k = e~ j27T/N . (6.1.5) 

n=0 



This is equivalent to evaluating polynomials at the location x = W^. Because of 
the convolution theorem of the Fourier transform, it is clear that a good Fourier 
transform algorithm will lead to efficient convolution computation. 

Let us recall from Section 2.4.8 that the Fourier transform matrix diagonalizes 
circular convolution matrices. That is, if B is a circulant matrix with first line 
(&o frjV-i b^-2 ■ ■ ■ b±) (the line i + 1 is a right-circular shift of the line i) then the 
circular convolution of the sequence b[n] with the sequence a[n] is a sequence c[n] 
given by 

c = B ■ a, 

where the vectors a and c contain the sequences a[n] and c[n], respectively. This 
can be rewritten, using the convolution theorem of the Fourier transform, as 

c = F' 1 ■ A Fa, 

where A is a diagonal matrix with F ■ b as the diagonal entries (the vector b 
contains the sequence b[n]). However, unless there is a fast way to compute the 
matrix-vector products involving F (or F~ , which is simply its transpose up to a 
scale factor), there is no computational advantage in using the Fourier domain for 
the computation of convolutions. 

Several algorithms exist to speed up the product of a vector by the Fourier 
matrix F which has entries Fij = W^ following (6.1.5) (note that rows and columns 
are numbered starting from 0). We briefly review these algorithms and refer the 
reader to [32, 90, 209], for more details. 

The Cooley-Tukey FFT Algorithm Assume that the length of the Fourier trans- 
form is a composite number, N = N±- N2. Perform the following change of variable 
in (6.1.5): 



n = N 2 • n\ + n 2 , n* = 0, . . . , iVj — 1, 
k = fci + Ni ■ k 2 , ki = 0, ...,Ni-l. 



(6.1.6) 



Then (6.1.5) becomes 

JVi-l A>2-1 



X[k 1 + NM = J2 £*[^ni + n2]W&X . (6 - L7) 

ni=0 112=0 
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Using the simplifications 

Wf? = 1, W 1 ^ = W l N2 , W 1 ^ = W l Nl , l€Z, 



and reordering terms, we can rewrite (6.1.7) as 



X[ki + N x k 2 ] = J2 W > 



N 2 -l 

N 2 
r»2=0 



N-,-1 
nik\ 



W N,N 2 



J2 x[N 2 n 1 + n 2 ]W^ kl 



m=0 



(6.1.8) 



We recognize: 

(a) The right sum as N 2 DFT's of size JVi. 

(b) N complex multiplications (by Wj^ 2 ^ ). 

(c) The left sum as iVi DFT's of size N 2 . 

If Ni and N 2 are themselves composite, one can iterate the algorithm. In particular, 
if N = 2 l and choosing JVi = 2, N 2 = N/2, (6.1.8) becomes 

N 2 -i 
X[2k 2 ] = J2 W N/2 ■ (^["2] +x[n 2 + N/2] ) , 

ri2=0 
N 2 -l 

X[2k 2 + 1] = J2 W N/2 • i W N • (^2} -x[n 2 + N/2] ) ] . 

n 2 =0 

Thus, at the cost of N/2 complex multiplications (by W^ 2 N ) we have reduced the 
complexity of a size-iV DFT to two size-(iV/2) DFT's. Iterating log 2 N — 1 times 
leads to trivial size-2 DFT's and thus, the complexity is of order N\og 2 N. Such 
an algorithm is called a radix-2 FFT and is very popular due to its simplicity and 
good performance. 

The Good-Thomas or Prime Factor FFT Algorithm When performing the index 
mapping in the Cooley-Tukey FFT (see (6.1.6)), we did not require anything except 
that N had to be composite. If the factors Ni and N 2 are coprime, a more powerful 
mapping based on the Chinese Remainder Theorem can be used [32]. The major 
difference is that such a mapping avoids the N/2 complex multiplications present in 
the "middle" of the Cooley-Tukey FFT, thus mapping a length- (NiN 2 ) DFT (N t 
and N 2 being coprime) into: 

(a) iVi DFT's of length N 2 , 

(b) N 2 DFT's of length JVi. 



354 CHAPTER 6 

This is equivalent to a two-dimensional FFT of size iVi x N 2 . While this is more 
efficient than the Cooley-Tukey algorithm, it will require efficient algorithms for 
lengths which are powers of primes, for which the Cooley-Tukey algorithm can be 
used. In particular, efficient algorithms for Fourier transforms on lengths which are 
prime are needed. 

Rader's FFT When the length of a Fourier transform is a prime number p, then 
there exists a permutation of the input and output such that the problem becomes 
a circular convolution of size p — 1 (and some auxiliary additions for the frequency 
zero which is treated separately). While the details are somewhat involved, Rader's 
method shows that prime-length Fourier transforms can be solved as convolutions 
and efficient algorithms will be in the generic form we saw in Section 6.1.1 (see the 
example in (6.1.4)). That is, the Fourier transform matrix F can be written as 

F = CMD, (6.1.9) 

where C and D are matrices of output and input additions (which are rectangular) 
and M is a diagonal matrix containing of the order of 2N multiplications. 

The Winograd FFT Algorithm We saw that the Good-Thomas FFT mapped a 
size-(NiN 2 ) Fourier transform into a two-dimensional Fourier transform. Using 
Kronecker products [32] (see (2.3.2)), we can thus write 

F Nl . N2 = F Nl ®F N2 . (6.1.10) 

If Ni and N 2 are prime, we can use Rader's algorithm to write F^ and F^ 2 in 
the form given in (6.1.9). Finally, using the property of Kronecker products given 
in (2.3.3) that (A <8 B)(C <8 D) = (A ■ C) <g> (B ■ D) (if the products are all well 
defined), we can rewrite (6.1.10) as 

F Nl <8 F N2 = (d • Mi • £>i) (8) (C 2 • M 2 • D 2 ) 

= (C 1 ®C 2 )-(M 1 ®M 2 )-(D 1 ®D 2 ). 

Since the size of M\ ® M 2 is of the order of (2N\ )-(2N 2 ), we see that the complexity 
is roughly 4N multiplications. In general, instead of the ./VlogiV behavior of the 
Cooley-Tukey FFT, the Winograd FFT has a C(N) ■ N behavior, where C(N) is 
slowly growing with N. For example, for iV = 1008 = 7-9-16, the Winograd 
FFT uses 3548 multiplications, while for N = 1024 = 2 10 , the split-radix FFT 
[90] uses 7172 multiplications. Despite the computational advantage, the complex 
structure of the Winograd FFT has lead to mixed success in implementations and 
the Cooley-Tukey FFT is still the most popular fast implementation of Fourier 
transforms. 
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Algorithms for Trigonometric Transforms Related to the Fourier Transform 

Most popular trigonometric transforms used in discrete-time signal processing are 
closely related to the Fourier transform. Therefore, an efficient way to develop a 
fast algorithm is to map the computational problem at hand into pre- and post- 
processing while having a Fourier transform at the center. We will briefly show this 
for the discrete cosine transform (DCT). The DCT is defined as (see also (7.1.10)- 
(7.1.11) in Chapter 7) 

Xl*]-I>H™(*e^). (6X1!) 

n=0 ^ ' 

To make it unitary, a factor of 1/y/N has to be included for k = 0, and yj2/N for 
k 7^ 0, but we skip the scaling since it can be included at the end. If we assume that 
the transform length TV is even, then it can be verified [203] that a simple input 
permutation given by 

x'[n] = x[2n], 
x'[N-n-l] = x[2n + l], n = 0, ..., — - 1, (6.1.12) 



2 



transforms (6.1.11) into 



n=0 ^ 

This can be related to the DFT of x'[n], denoted by X'[fc], in the following manner: 

X[k] = cos (^) Re[X'[k}} - sin (^) Im[X'[k}}. 

Evaluating X[k] and X[N — k — 1] at the same time, it is easy to see that they 
follow from X'[k] with a rotation by 2nk/4N [322]. Therefore, the length- TV DCT 
on a real vector has been mapped into a permutation (6.1.12), a Fourier transform 
of length- N and a set of N/2 rotations. Since the Fourier transform on a real vector 
takes half the complexity of a general Fourier transform [209] , this is a very efficient 
way to compute DCT's. While there exist "direct" algorithms, it turns out that 
mapping it into a Fourier transform problem is just as efficient and much easier. 

6.1 .3 Complexity of Multirate Discrete-Time Signal Processing 

The key to reduce the complexity in multirate signal processing is a very simple 
idea: always operate at the slowest possible sampling frequency. 
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(a) B(x)- — A(x) — Ui) — ' c oW 



(b) B(x) < 



B { {x) 



A (x) 



(- 



Figure 6.2 Implementation of filtering followed by downsampling by 2. (a) 
Original system, (b) Decomposition of input into even and odd components 
followed by filtering with even and odd filters. D stands for a delay by 1. 



is 



Filtering and Downsampling Convolution followed by downsampling by 2 
equivalent to computing only the even samples of the convolution. Using the 
polyphase components of the sequences involved (see Section 3.2.1), the convolution 
(6.1.1)-(6.1.2) followed by downsampling by 2 becomes 



C Q {x) = A (x)-B Q (x)+x-A 1 (x)-B 1 (x). 



(6.1.13) 



This is equivalent to filtering the two independent signals Bq{x) and B\(x) by the 
half-length filters A\{x) and Aq(x) (see Figure 6.2). Because of the independence, 
the complexity of the two polynomial products in (6.1.13) adds up. Assuming A(x) 
and B{x) are of odd degree 2M — 1 and 2N — 1, then we have to evaluate two 
products between polynomials of degree M — 1 and N — 1, which takes at least 
2(M + N — 1) multiplications. This is almost as much as the lower bound for 
the full polynomial product (which is 2(M + N) — 1 multiplications). If an FFT- 
based convolution is used, we get some improvement. Assuming that an FFT takes 
C ■ L ■ log 2 L operations, 1 it takes 2 • C ■ L ■ log 2 L + L operations to perform a 
length-L circular convolution (the transform of the filter is precomputed) . Assume 
a length-iV input and a length-iV filter and use a length-2A^ FFT. Direct convo- 
lution therefore takes 4 • C • N ■ (log 2 N + 1) + 2N operations. The computation 
of (6.1.13) requires two FFT's of size iV (for Bq{x) and B\(x)), 2N operations for 
the frequency-domain convolution, and a size-A^ inverse FFT to recuperate Cq(x), 
that is, a total of 3 • C ■ N ■ log 2 N + 2N. This is a saving of roughly 25% over the 
nondownsampled convolution. 



1 C is a small constant which depends on the particular length and FFT algorithm. For example, 
the split-radix FFT of a real signal of length N — 2" requires 2 n ~ 1 (n — 3) + 2 real multiplications 
and 2 n_1 (3n - 5) + 4 real additions [90]. 
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Substantial improvements appear only if straight polynomial products are im- 
plemented, since the 4MN complexity of the nondownsampled product becomes a 
2MN complexity for computing the two products in (6.1.13). The main point is 
that, reducing the size of the polynomial products involved in (6.1.13) might allow 
one to use almost optimal algorithms, which might not be practical for the full 
product. 

The discussion of the above simple example involving downsampling by 2, gener- 
alizes straightforwardly to any downsampling factor K. Then, a polynomial product 
is replaced by K products with if-times shorter polynomials. 

Upsampling and Interpolation The operation of upsampling by 2 followed by 
interpolation filtering is equivalent to the following convolution: 



C(x) = A(x) • B(x 2 ), 

where B{x) is the input and A(x) the interpolation filter. 
Aq(x 2 ) + x • Ai(x 2 ), the efficient way to compute (6.1.14) is 

C{x) = B{x 2 )-A {x 2 )+xB{x 2 )-A 1 {x 2 ), 



(6.1.14) 



Writing A{x) 



that is, two polynomial products where each of the terms is approximately of half 
size, since B(x 2 ) ■ Aq(x 2 ) can be computed as B(x) ■ Aq(x) and then upsampled 
(similarly for B{x 2 ) ■ A\{x 2 )). That this problem seems very similar to filtering 
and downsampling is no surprise, since they are duals of each other. If one writes 
the matrix that represents convolution by a[n] and downsampling by two, then its 
transpose represents upsampling by two followed by interpolation with a[n] (where 
a[n] is the time-reversed version of a[n]). This is shown in a simple three-tap filter 
example below 



. .. o[0] 

o[2] o[l] o[0] 
. .. a[2] o[l] o[0] 



T 



t ■ \ 

o[0] a [2] 

o[l] 

a[0] a [2] 

o[l] 

V '■ o[0] J 



The block diagram of an efficient implementation of upsampling and interpolation 
is thus simply the transpose of the diagram in Figure 6.2. Both systems have the 
same complexity, since they require the implementation of two half-length filters 
(Aq(x) and A\(x)) in the downsampled domain. 

Of course, upsampling by an arbitrary factor K followed by interpolation can 
be implemented by K small filters followed by upsampling, shifts, and summation. 
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B(x) • A(x) -(2^) A(x) -(2^) A(x) -(2^) — • • 



Figure 6.3 Iteration of filtering and downsampling. 

Iterated Multirate Systems A case that appears often in practice, especially 
around discrete-time wavelet series, is the iteration of an elementary block such 
as filtering and downsampling as shown in Figure 6.3. An elementary, even if some- 
what surprising, result is the following: If the complexity of the first block is C 
operations/input sample, then the upper bound on the total complexity, irrespec- 
tive of the number of stages, is 2C. The proof is immediate, since the second block 
has complexity C but runs at half sampling rate and similarly, the ith block runs 
2 i_1 times slower than the first one. Thus, the total complexity for K blocks be- 
comes 

Ctot = C+^ + j + ---^ z - 1 = 2c(l--L) < 2C. (6.1.15) 

This property has been used to design very sharp filters with low complexity in 
[236]. While the complexity remains bounded, the delay does not. If the first block 
contributes a delay D, the second will produce a delay 2D and the ith block a delay 
2 % ~ 1 D. That is, the total delay becomes 

D tot = D + 2D + 4D + --- + 2 K ~ 1 D = {2 K -l)D. 

This large delay is a serious drawback, especially for real-time applications such as 
speech coding. 

Efficient Filtering Using Multirate Signal Processing One very useful applica- 
tion of multirate techniques to discrete-time signal processing has been the efficient 
computation of narrow-band filters. There are two basic ideas behind the method. 
First, the output of a lowpass filter can be downsampled, and thus, not all outputs 
have to be computed. Second, a very long narrow-band filter can be factorized into 
a cascade of several shorter ones and each of these can be downsampled as well. 
We will show the technique on a simple example, and refer to [67] for an in-depth 
treatment. 

Example 6.2 

Assume we desire a lowpass filter with a cutoff frequency 7t/12. Because of this cutoff 
frequency, we can downsample the output, say by 8. Instead of a direct implementation, we 
build a cascade of three filters with a cutoff frequency 7r/3, each downsampled by two. We 
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Figure 6.4 Spectral responses 
equivalent filter. (a) \H{e ju] )\, 
\H{e^)\\H{e^)\\H{e^)\. 



of individual filters 
\H(e j2u )\, \H(e j4uj )\. 



and the resulting 
(b) \H(e>")\ = 



call such a filter a third-band filter. Using the interchange of downsampling and filtering 
property, we get an equivalent filter with a ^-transform: 



H e , 



= H(z)-H(z 



Htz' 1 



where H(z) is the z-transform of the third-band lowpass filter. The spectral responses of 
H(e juJ ), H(e j2uJ ), and H(e j4 ") are shown in Figure 6.4(a) and their product, H eqxliv (z), is 
shown in Figure 6.4(b), showing that a 7r/12 lowpass filter is realized. Note that its length 
is approximately equal to L + 2L + 4L — 7L, where L is the length of the filter with the 
cutoff frequency 7r/3. 



If the filtered signal is needed at the full sampling rate, one can use upsampling 
and interpolation filtering and the same trick can be applied to that filter as well. 

Because of the cascade of shorter filters, and the fact that each stage is downsam- 
pled, it is clear that substantial savings in computational complexity are obtained. 
How this technique can be used to derive arbitrary sharp filters while keeping the 
complexity bounded is shown in [236]. 
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6.2 Complexity of Discrete Bases Computation 

This section is concerned with the complexity of filter bank related computations. 
The basic ingredients are the multirate techniques of the previous section, as well 
as polyphase representations of filter banks. 

6.2.1 Two-Channel Filter Banks 

Assume a two-channel filter bank with filter impulse responses ho[n] and h\[n] of 
length L. Recall from (3.2.22) in Section 3.2.1, that the channel signals equal 

Y (z) \ ( Hoo(z) H 01 (z)\ (X (z)\ 

Y x {z) ) \ H w (z) H n (z) J ' v. X x {z) ) " ^" 1J 

Unless there are special relationships among the filters, this amounts to four con- 
volutions by polyphase filters of length L/2 (assuming L even). For comparison 
purposes, we will count the number of operations for each new input sample. The 
four convolutions operate at half the input rate and thus, for every two input sam- 
ples, we compute 4 • L/2 multiplications and 4((L/2) — 1) + 2 additions. This leads 
to L multiplications and L — 1 additions/input sample, that is, exactly the same 
complexity as a convolution by a single filter of size L. If an FFT-based convolution 
algorithm is used, the transforms of Xq(z) and X\{z) can be shared for the compu- 
tation of Yq(z) and Y\{z). Assuming again that a length- A FFT uses C ■ N ■ log 2 N 
operations and that the input signal and the filters are of length L, we get, since 
we need FFT's of length L to compute the polynomial products in (6.2.1) (which 
are of size L/2 x L/2): 

(a) 2 ■ C ■ L ■ log 2 L operations to get the transforms of Xq{z) and X±(z), 

(b) 4L operations to perform the frequency-domain convolutions, 

(c) 2 ■ C ■ L ■ log 2 L operations for the inverse FFT's to get Y${z) and Y\{z), 

where we assumed that the transforms of the polyphase filters were precomputed. 
That is, the Fourier-domain evaluation requires 

4 • C ■ L ■ log 2 L + 4L operations, 

which is of the same order as Fourier-domain computation of a length- L filter 
convolved with a length-L signal. 

In [245], a precise analysis is made involving FFT's with optimized lengths so 
as to minimize the operation count. Using the split-radix FFT algorithm [90], the 
number of operations (multiplications plus additions/sample) becomes (for large 
L) 

41og 2 L + O(loglogL), (6.2.2) 
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which is to be compared with 2L — 1 multiplications plus additions for the direct 
implementation. The algorithm starts to be effective for L = 8 and an FFT size of 
f6, where it achieves around 5 multiplications/point (rather than 8) and leads to 
improvements by an order of magnitude for large filters such as L = 64 or 128. For 
medium size filters (L = 6, . . . , 12), a method based on fast running convolution is 
best (see [245] and Section 6.5 below). 

Let us now consider some special cases where additional savings are possible. 

Linear Phase Filters It is well-known that if a filter is symmetric or antisym- 
metric, the number of operations can be halved in the direct implementation by 
simply adding (or subtracting) the two input samples that are multiplied by the 
same coefficient. This trick can be used in the downsampled case as well, that is, 
filter banks with linear phase filters require half the number of multiplications, or 
L/2 multiplications/input sample (the number of additions remains unchanged). 
If the filter length is odd, the polyphase components are themselves symmetric or 
antisymmetric, and the saving is obvious in (6.2.1). 

Certain linear phase filter banks can be written in cascade form [321] (see Sec- 
tion 3.2.4). That is, their polyphase matrix is of the form given in (3.2.70): 



H„(z) = ('[ _\ \ 



nfl A) U T 



.i=i 

The individual 2x2 symmetric matrices can be written as (we assume on / 1) 
1 Oii\ 1-cti (I -1 \ f ¥^ 0\ / 1 -1 



on 1 ) 2 \ 1 -1 J \ \) \ 1 -1 

By gathering the scale factors together, we see that each new block in the cascade 
structure (which increases the length of the filters by two) adds only one multi- 
plication. Thus, we need order-(L/2) multiplications to compute a new output in 
each channel, or L/A multiplications /input sample. The number of additions is of 
the order of L additions /input sample [321]. 

Classic QMF Solution The classic QMF solution given in (3.2.34)-(3.2.35) (see 
Figure 6.5(a)), besides using even- length linear phase filters, forces the highpass 
filter to be equal to the lowpass, modulated by (— l) n . The polyphase matrix is 
therefore: 

/ H (z) H^z) \ (I 1 W H (z) 

Mp[Z) { H (z) -H^z) ) - [l -l )■{ H x {z) 

where Hq and H\ are the polyphase components of the prototype filter H{z). The 
factorized form on the right indicates that the complexity is halved, and an obvious 
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H(z) K ^t^ 



H(-z) y^y^ 
(a) 



2Jj— Hflz) 



^o^i H H ' (z) 




(b) 



Figure 6.5 Classic QMF filter bank, (a) Initial filter bank, (b) Efficient 
implementation using polyphase components and a butterfly. 



implementation is shown in Figure 6.5(b). Recall that this scheme only approxi- 
mates perfect reconstruction when using FIR filters. 

Orthogonal Filter Banks As seen in Section 3.2.4, orthogonal filter banks have 
strong structural properties. In particular, because the highpass is the time-reversed 
version of the lowpass filter modulated by (— l) n , the polyphase matrix has the 
following form: 

Hoo(z) H 01 (z) 
-H 01 (z) H 00 (z) 



H p (z) 



(6.2.3) 



where Hqq{z) and Hq\{z) are time-reversed versions of Hqq(z) and Hq\(z), and 
Hqq(z) and Hq\{z) are the two polyphase components of the lowpass filter. If 
Hqq(z) and Hqi(z) were of degree zero, it is clear that the matrix in (6.2.3) would 
be a rotation matrix, which can be implemented with three multiplications. It turns 
out that for arbitrary degree polyphase components, terms can still be gathered into 
rotations, saving 25% of multiplications (at the cost of 25% more additions) [104]. 
This rotation property is more obvious in the lattice structure form of orthogonal 
filter banks [310]. We recall that the two-channel lattice factorizes the paraunitary 
polyphase matrix into the following form (see (3.2.60)): 



HJz) 



H 00 (z) 
H w (z) 



Hoi(z) 
H n (z) 



U ( 



iV-l 

n 



o 



Ui 



where filters are of length L = 2N and the matrices Ui are 2x2 rotations. Such 
rotations can be written as (where we use the shorthand a» and bi for cos(ckj) and 
sin(aj) respectively) [32] 
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(6.2.4) 
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Table 6.1 Number of arithmetic opera- 
tions/input sample for various two-channel 
filter banks with length-L filters, where \i 
and a stand for multiplications and additions, 
respectively. 



Filter bank type 


#Of/i 


# of a 


General two-channel 
filter bank 


L 


L- 1 


Linear phase filter bank 
direct form 
lattice form 


L/2 
L/4 


L- 1 
L 


QMF filter bank 


L/2 


L/2 


Orthogonal filter bank 

direct form 

lattice form 

denormalized lattice 


L 
3L/A 

L/2 


L- 1 

3L/4 
3L/4 


Frequency-domain computation 
(assuming large L) [245] 


log 2 L 


3 log 2 L 



Thus, only three multiplications are needed, or 3N for the whole lattice. Since the 
lattice works in the downsampled domain, the complexity is 3N/2 multiplications 
or, since N = L/2, 3L/4 multiplications/input sample and a similar number of 
additions. A further trick consists in denormalizing the diagonal matrix in (6.2.4) 
(taking out b{ for example) and gathering all scale factors at the end of the lattice. 
Then, the complexity becomes (L/2) + l multiplications/input sample. The number 
of additions remains unchanged. 

Table 6.1 summarizes the complexity of various filter banks. Except for the last 
entry, time-domain computation is assumed. Note that in the frequency-domain 
computation, savings due to symmetries become minor. 

6.2.2 Filter Bank Trees and Discrete-Time Wavelet Transforms 

Filter bank trees come mostly in two flavors: the full-grown tree, where each branch 
is again subdivided, and the octave-band tree, where only the lower branch is further 
subdivided. 

First, it is clear that techniques used to improve two-channel banks will improve 
any tree structure when applied to each elementary bank in the tree. Then, specific 
techniques can be developed to compute tree structures. 



Full Trees If an elementary block (a two-channel filter bank downsampled by two) 
has complexity Co, then a K-st&ge full tree with 2 K leaves has complexity K ■ Cq. 
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This holds because the initial block is followed by two blocks at half rate (which 
contributes 2 -Co/2), four blocks at quarter rate and so on. Thus, while the number 
of leaves grows exponentially with K, the complexity only grows linearly with K. 

Let us discuss alternatives for the computation of the full tree structure in the 
simplest, two-stage case, shown in Figure 6.6(a). It can be transformed into the 
four-channel filter bank shown in Figure 6.6(b) by passing the second stage of fil- 
ters across the first stage of downsampling. While the structure is simpler, the 
length of the filters involved is now of the order of 2>L if Hi{z) is of degree L — 1. 
Thus, unless the filters are implemented in factorized form, this is more complex 
than the initial structure. However, the regular structure might be preferred in 
hardware implementations. 

Let us consider a Fourier- domain implementation. A simple trick consists of 
implementing the first stage with FFT's of length N and the second stage with 
FFT's of length N/2. Then, one can perform the downsampling in Fourier domain 
and then, the forward FFT of the second stage cancels the inverse FFT of the first 
stage. The downsampling in Fourier domain requires N/2 additions, since if X[k] is 
a length- N Fourier transform, the length- N/2 Fourier transform of its downsampled 
version is 

Y[k] = l -{X[k)+X[k + N/2)). 

Figure 6.6(c) shows the algorithm schematically, where, for simplicity, the filters 
rather than the polyphase components are shown. The polyphase implementation 
requires to separate even and odd samples in time domain. The even samples are 
obtained from the Fourier transform X\k\ as 



y[2n] = Y, X W W > 



N-l 

2nk 
N 
k=0 



N/2 

J2 (X[k] +X[k + N/2] ) W~fi, (6.2.5) 

fc=0 



while the odd ones require a phase shift 

JV-l 



y[2n+l] = J2 X ^ W 



(2n+l)fc 

N 
k=0 
N-l 



J2 W^ h (X[k] +X[k + N/2] ) W~^. (6.2.6) 



k=0 



If the next stage uses a forward FFT of size N/2 on y[2n] and y [2n + 1] , the inverse 
FFT's in (6.2.5) and (6.2.6) are cancelled and only the phase shift in (6.2.6) remains. 
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Figure 6.6 Two-stage full-tree filter bank, (a) Initial system, (b) Parallelized 
system, (c) Fourier-domain computation with implicit cancellation of forward 
and inverse transforms between stages. FS stands for Fourier-domain down- 
sampling. Note that in the first stage the Hi[k] are obtained as outputs of a 
size- TV FFT, while in the second stage, they are outputs of a size-7V/2 FFT. 



These complex multiplications can be combined with the subsequent filtering in 
Fourier domain. Therefore, we have shown how to merge two subsequent stages 
with only ./V additions. Note that the length of the FFT's have to be chosen carefully 
so that linear convolution is computed at each stage. In the case discussed here, 
N/2 (the size of the second FFT) has to be larger than (3L + L s — 2)/2 where L 
and L s are the filter and signal lengths, respectively (the factor 1/2 comes from the 
fact that we deal with polyphase components). 

While this merging improves the computational complexity, it also constrains 
the FFT length. That is, the length will not be optimal for the first or the second 
stage, resulting in a certain loss of optimality. 
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Octave-Band Trees and Discrete-Time Wavelet Series In this case, we can use 
the property of iterated multirate systems which leads to a complexity independent 
of the number of stages as seen in (6.1.15). For example, assuming a Fourier- domain 
implementation of an elementary two-channel bank which uses about (4 log 2 L) op- 
erations/input sample as in (6.2.2), a K-st&ge discrete-time wavelet series expansion 
requires of the order of 

8 log 2 L (1 — 1/2 ) operations 

for long filters implemented in Fourier domain, and 

41(1- 1/2*") operations (6.2.7) 

for short filters implemented in time domain. As mentioned earlier, filters of length 
8 or more are more efficiently implemented with Fourier- domain techniques. 

Of course, the merging trick of inverse and forward FFT's between stages can 
be used here as well. A careful analysis made in [245] shows that merging of two 
stages pays off for filter lengths of 16 or more. Merging of more stages is marginally 
interesting for large filters since it involves very large FFT's, which is probably 
impractical. Again, fast running convolution methods are best for medium size 
filters (L = 6, . . . , 12) [245]. Finally, all savings due to special structures, such as 
orthogonality or linear phase, carry over to tree structures as well. 

The study of hardware implementations of discrete-time wavelet transforms is 
an important topic as well. In particular, the fact that different stages run at 
different sampling rates makes the problem nontrivial. For a detailed study and 
various solutions to this problem, see [219]. 

6.2.3 Parallel and Modulated Filter Banks 

General parallel filter banks have an obvious implementation in the polyphase do- 
main. If we have a filter bank with K channels and downsampling by M, we get, 
instead of (6.2.1), a K x M matrix times a size-M vector product (where all entries 
are polynomials). The complexity of straightforward computation is comparable, 
when K = M, to a single convolution since we have M filters downsampled by M. 
Fourier methods require M forward transforms (for each polyphase component), 
K ■ M frequency-domain convolutions, and finally, K inverse Fourier transforms to 
obtain the channel signals in the time domain. 

A more interesting case appears when the filters are related to each other. The 
most important example is when all filters are related to a single prototype filter 
through modulation. 

The classic example is (see (3.4.13)-(3.4.14) in Section 3.4.3) 

Hi(z) = H pi (W l N z), i = 0,...,N-l, W N = e-W, (6.2.8) 

hi[n] = W„ in h pr [n}. (6.2.9) 
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Figure 6.7 Modulated filter bank implemented with an FFT. 



This corresponds to a short-time Fourier or Gabor transform filter bank. The 
polyphase matrix with respect to downsampling by N has the form shown below 
(an example for TV = 3 is given): 



H p (z) 



II, 



pr (Z) ii pri [z) ri pr2 [Z) 

H pro (z) W 3 H pri (z) WiH pr2 (z) 

H pro (z) WiH pri (z) W z H pr2 {z) 



3 ■ 



II 



pr { z ) 











H pri (z) 











±1 Dr v> 



(*) 



(6.2.10) 



where H pri (z) is the ith polyphase component of the filter H pr (z) and F% is the size- 
3 discrete Fourier transform matrix. The implementation is shown in Figure 6.7. 
This fast implementation of modulated filter banks using polyphase filters of the 
prototype filter followed by a fast Fourier transform is central in several applications 
such as transmultip lexers. This fast algorithm goes back to the early 70's [25]. The 
complexity is now substantially reduced. The polyphase filters require iV-times less 
complexity than a full filter bank, and the FFT adds an order ./V log 2 N operations 
per N input samples. The complexity is of the order of 



(2— + 2 • log 2 N) operations/input sample, 



(6.2.11) 



that is, a substantial reduction over a single, length-L filtering operation. Further 
reductions are possible by implementing the polyphase filters in frequency domain 
(reducing the term of order L to log 2 L) and merging FFT's into a multidimensional 
one [210]. Another important and efficient filter bank is based on cosine modulation. 
It is sometimes referred to as lapped orthogonal transforms (LOT's) [188] or local 
cosine bases [63]. Several possible LOT's have been proposed in the literature 
and are of the general form described in (3.4.17-3.4.18) in Section 3.4.3. Using 
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trigonometric identities, this can be reduced to N polyphase filters followed by a 
DCT-type of transform of length N (see (6.1.11)). Other LOT's lead to various 
length- TV or length-2iV trigonometric transforms, preceded by polyphase filters of 
length two or larger [187]. 

6.2.4 Multidimensional Filter Banks 

Computational complexity is of particularly great concern in multidimensional sys- 
tems, since, for example, filtering an iV x N image with a filter of size L x L 
requires of the order of iV 2 • 1? operations. If the filter is separable, that is, 
H(zi,Z2) = Hi(zi)H2(z2), then filtering on rows and columns can be done sepa- 
rately and the complexity is reduced to an order 2N 2 L operations (N row filterings 
and N column filterings, each using NL operations). 

A multidimensional filter bank can be implemented in its polyphase form, bring- 
ing the complexity down to the order of a single nondownsampled convolution, just 
as in the one- dimensional case. A few cases of particular interest allow further 
reductions in complexity. 

Fully Separable Case When both filters and downsampling are separable, then 
the system is the direct product of one-dimensional systems. The implementation 
is done separately over each dimension. For example, consider a two-dimensional 
system filtering anJVxiV image into four subbands using the filters {Hq(zi)Hq(z2), 
Hq{zi)H\{z2)i Hi(z{)Hq(z2), Hi(zi)Hi(z2}} each of size LxL followed by separable 
downsampling by two in each dimension. This requires N decompositions in one 
dimension (one for each row), followed by N decompositions in the other, or a total 
of 2N 2 ■ L multiplications and a similar number of additions. This is a saving of the 
order of L/2 with respect to the nonseparable case. Note that if the decomposition 
is iterated on the lowpass only (that is, a separable transform), the complexity is 
only 

n C C 4 ^. 

Ctot = <?+- + — + ••• < q C > 
4 16 6 

where C is the complexity of the first stage. 

Separable Polyphase Components The last example led automatically to sepa- 
rable polyphase components, because in the case of separable downsampling, there 
is a direct relationship between separability of the filter and its polyphase com- 
ponents [163]. When the downsampling is nonseparable, separable filters yield 
nonseparable polyphase components in general. Thus, it might be more efficient 
to compute convolutions with the filters rather than their polyphase components. 
Finally, one can construct filter banks with separable polyphase components (cor- 
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responding to nonseparable niters in the nonseparable downsampling case) having 
thus an efficient implementation and yielding savings of order L/2. 

6.3 Complexity of Wavelet Series Computation 

The computational complexity of evaluating expansions into wavelet bases is con- 
sidered in this section, as well as that of related problems such as iterated filters 
used in regularity estimates of wavelets. 

6.3.1 Expansion into Wavelet Bases 

Assume a multiresolution analysis structure as defined in Section 4.2. If we have 
the projection onto Vo, that is, samples x[n] = {ip(t — n),x(t)), then Mallat's 
algorithm given in Section 4.5.3, indicates that the expansion onto Wi, i = 1,2, . . . 
can be evaluated using an octave-band filter bank. Therefore, given the initial 
projection, the complexity of the wavelet expansion is of order 2L multiplications 
and 2L additions /input sample (see (6.2.7)) where L is the length of the discrete- 
time filter, or equivalently, the order of the two-scale equation. Unless the wavelet 
ip(t) is compactly supported, L could be infinite. For example, many of the wavelets 
designed in Fourier domain (such as the Meyer's and Battle-Lemarie's wavelets) lead 
to an unbounded L. In general, implementations simply truncate the infinitely long 
filter and a reasonable approximation is computed with finite computational cost. 

A more attractive alternative is to find recursive filters which perform an exact 
computation at finite computational cost. An example is in the case of spline 
spaces (see Section 4.3.2), where instead of the usual Battle-Lemarie wavelet, an 
alternative one can be used which leads to an IIR filter implementation [133, 296]. 

When we cannot assume to have access to the projection onto Vo, an approxima- 
tion known as Shensa's algorithm [261] can be used (see Section 4.5.3). It represents, 
as an initial step, a nonorthogonal projection of the input and the wavelets onto 
suitable approximation spaces. In terms of computational complexity, Shensa's al- 
gorithm involves a prefiltering stage with a discrete-time filter, thus adding an order 
2L p number of operations where L p is the length of the prefilter. 

Therefore, the computation of the wavelet series into K octaves requires about 



2 L (1- 1/2 A ) + L 



i> 



multiplications and a similar number of additions. Of course, applying Fourier 
transform, the orders L and L p are reduced to their logarithms. This efficiency for 
computing in discrete time, a series expansion which normally uses integrals, is one 
of the main attractive features of the wavelet decomposition. 
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6.3.2 Iterated Filters 

The previous section showed a completely discrete-time algorithm for the compu- 
tation of the wavelet series. However, underlying this scheme are continuous-time 
functions (p(t) and tp(t), which often correspond to iterated discrete-time filters. 
Such iterated filters are usually computed during the design stage of a wavelet 
transform, so as to verify properties of the scaling function and wavelet such as reg- 
ularity. Because the complexity appears only once, it is not as important to reduce 
it as in the computation of the transform itself. However, the algorithms are simple 
and the computational burden can be heavy especially in multiple dimensions, thus 
we briefly discuss fast algorithms for iterated filters. Recall from (4.4.9) that we 
wish to compute 

i-l 



G o(*) = II Go (z 2k • (6.3.1) 



k=0 

For simplicity, we will omit the subscript "0" and will simply call the lowpass filter 
G. The length of C?W (z) is equal to 

L« = (2 < -l)(L-l) + l. 

From (6.3.1), the following identities can be verified (Problem 6.5): 

G®(z) = G(z)-G ( - i ~ 1 \z 2 ), (6.3.2) 

G®(z) = G{z^~ l )-G^-^{z), (6.3.3) 

G^\z) = G^ k - 1 \z)-G^- 1 \z 22k ~ 1 ). (6.3.4) 

The first two relations will lead to recursive algorithms, while the last one produces 
a doubling algorithm and can be used when iterates which are powers of two are 
desired. Computing (6.3.2) as 

GM(z) = [Goiz^ + z^G^-G^iz 2 ), 

where Go and G\ are the two polyphase components of filter G, leads to two 
products between polynomials of size L/2 and (2* _1 — 1)(L — 1) + 1. Calling 
0[G^'(z)] the number of multiplications for finding G^'(z), we get the recursion 
0[G^(z)} = L-L^-V + OlG^-^iz)}. Again, because G^~^(z) takes half as much 
complexity as G"'(z), we get an order of complexity 



O 



G (i \z)} ~ 2-L-L^- 1 ) ~ 2 i -L 2 , (6.3.5) 



for multiplications and similarly for additions. 
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For a Fourier-domain evaluation, it turns out that the factorization (6.3.3) is 
more appropriate. In (6.3.3), we have to compute 2 t ~ 1 products between poly- 
nomials of size L (corresponding to G(z)) and of size L^ % ~ 1 ' /2 l ~ l (corresponding 
to the polyphase components of G^~ l '(z)). Now, U- % ~ 1 ' j2 % ~ 1 is roughly of size 
L as well. That is, using direct polynomial products, (6.3.3) takes 2 l ~ l times L 2 
multiplications and as many additions, and the total complexity is the same as in 
(6.3.5). However, using FFT's produces a better algorithm. The L x L polynomial 
products require two Fourier transforms of length 2L and 2L frequency products, 
or, L ■ log 2 L + 2L multiplications using the split-radix FFT. The step leading to 
G'*'(?) thus uses 2^~ 1 > • L(log 2 L + 2) multiplications and the total complexity is 



O 



G (i \z)] = 2 i • L(log 2 L + 2) 



multiplications, and about three times as many additions. This compares favorably 
to time-domain evaluation (6.3.5). As usual, this is interesting for medium to large 
L's. It turns out that the doubling formula (6.3.4), which looks attractive at first 
sight, does not lead to a more efficient algorithm than the ones we just outlined. 

The savings obtained by the above simple algorithms are especially useful in 
multiple dimensions, where the iterates are with respect to lattices. Because mul- 
tidimensional wavelets are difficult to design, iterating the filter might be part of 
the design procedure and thus, reducing the complexity of computing the iterates 
can be important. 

6.4 Complexity of Overcomplete Expansions 

Often, especially in signal analysis, a redundant expansion of the signal is desired. 
This is unlike compression applications, where nonredundant expansions are used. 
As seen in Chapter 5, the two major redundant expansions used in practice are 
the short-time Fourier (or Gabor) transform, and the wavelet transform. While the 
goal is to approximate the continuous transforms, the computations are necessarily 
discrete and amount to computing the transforms on denser grids than their or- 
thogonal counterparts, and this in an exact or approximate manner, depending on 
the case. 

6.4.1 Short-Time Fourier Transform 

The short-time Fourier transform is computed with a modulated filter bank as in 
(6.2.8)-(6.2.9). The only difference is that the outputs are downsampled by M < N, 
and we do not have a square polyphase matrix as in (6.2.10). However, because the 
modulation is periodic with period N for all filters, there exists a fast algorithm. 
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Compute the following intermediate outputs: 

Xi [n] = ^2h[kN + i]-x[n- kN - i\. (6.4.1) 

k 

Then, the channel signals y«[n] are obtained by Fourier transform from the Xj[n]'s 

y[n] = F ■ x[n], 

where y[n] = (yo[n\ . . . yN-i[n] ) T , x[n] = (xo[n] . . . XN-i[n] ) T , and F is the size 
N x N Fourier matrix. The complexity per output vector y[n] is L multipli- 
cations and about L — N additions (from (6.4.1)) plus a size-iV Fourier trans- 
form, or, (N/2) log 2 N multiplications and three times as many additions. Since 
y[n] has a rate M times smaller than the input, we get the following multi- 
plicative complexity per input sample (where K = N/M is the oversampling ra- 
tio): 

±(L + Nlog 2 N) = K-(^ + Unr.,X 

that is, K times more than in the critically sampled case given in (6.2.11). The 
additive complexity is similar (except for a factor of 3 in front of the log 2 N). 

Because M < N, the polyphase matrix is nonsquare of size N x M and does not 
have a structure as simple as the one given in (6.2.10). However, if ./V is a multiple 
of M, some structural simplifications can be made. 

6.4.2 "Algorithme a Trous" 

Mallat's and Shensa's algorithms compute the wavelet series expansion on a discrete 
grid corresponding to scales ai = 2* and shifts bij = j ■ 2 % (see Figure 6.8 (a)). 
We assume i = 0,1,2,..., in this discussion. The associated wavelets form an 
orthonormal basis, but the transform is not shift-invariant, which can be a problem 
in signal analysis or pattern recognition. An obvious cure is to compute all the 
shifts, that is, avoid the downsampling (see Figure 6.8(b)). Of course, scales are 
still restricted to powers of two, but shifts are now arbitrary integers. It is clear 
that the output at scale aj is 2 4 -times oversampled. To obtain this oversampled 
transform, one simply finds the equivalent filters for each branch of the octave- 
band tree which computes the discrete-time wavelet series. This is shown in Figure 
6.9. The filter producing the oversampled wavelet transform at scale a» = 2* has a 
z-transform equal to 

i-2 



Fi(z) = hJz 2 *' ) l\H { 



1=0 
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Figure 6.8 Sampling of the time-scale plane, (a) Sampling in the orthogonal 
discrete-time wavelet series, (b) Oversampled time-scale plane in the "algo- 
rithme a trous" . (c) Multiple voices/octave. The case of three voices/octave 
is shown. 



An efficient computational structure simply computes the signals along the tree and 
takes advantage of the fact that the filter impulse responses are upsampled, that is, 
nonzero coefficients are separated by 2 k zeros. This lead to the name "algorithme 
a trous" (algorithm with holes) given in [136]. It is immediately obvious that 
the complexity of a direct implementation is now 2L multiplications and 2(L — 1) 
additions/octave and input sample, since each octave requires filtering by highpass 
and lowpass filters which have L nonzero coefficients. Thus, to compute J octaves, 
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Figure 6.9 Oversampled discrete-time wavelet series, (a) Critically sampled 
case, (b) Oversampled case obtained from (a) by deriving the equivalent fil- 
ters and skipping the downsampling. This approximates the continuous-time 
wavelet transform. 



the complexity is of the order of 

4 • L ■ J operations/input sample 

that is, a linear increase with the number of octaves. The operations can be moved 
to Fourier domain to reduce the order L to an order log 2 L and octaves can be 
merged, just as in the critically sampled case. A careful analysis of the result- 
ing complexity is made in [245], showing gains with Fourier methods for filters of 
medium length (L > 9). 



6.4.3 Multiple Voices Per Octave 

While the above algorithm increased the sampling in time, it remained an "octave 
by octave" algorithm. Sometimes, finer scale changes are desired. Instead of a = 2 l , 
one uses a = 2^ +m ' M , m = 0, . . . , M— 1, which gives M "voices" /octave. Obviously, 
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for m = 0, one can use the standard octave by octave algorithm, involving the 
wavelet ip(t). To get the scales for m = 1, . . . ,M — 1, one can use the slightly 
stretched versions 

i/,( m )(t) = 2- m / 2M i)(2- m / M t\, m = l,...,M-l. 

The tiling of the time-scale plane is shown in Figure 6.8(c) for the case of three 
voices/octave (compare this with Figure 6.8(a)). Note that lower voices are over- 
sampled, but the whole scheme is redundant in the first place since one voice would 
be sufficient. The complexity is M times that of a regular discrete-time wavelet 
series, if the various voices are computed in an independent manner. 

The parameters of each of the separate discrete-time wavelet series have to be 
computed (following Shensa's algorithm), since the discrete-time filters will not 
be "scales" of each other, but different approximations. Thus, one has to find 
the appropriate highpass and lowpass filters for each of the m-voice wavelets. An 
alternative is to use the scaling property of the wavelet transform. Since 

(x(t),<p(at)) = -(x(t/a),ip(t)), 
a 

we can start a discrete-time wavelet series algorithm with m signals which are scales 
of each other; x m (t) = 2 m / 2M x(2 m / M t), m = 0, . . . , M - 1. Again, the complexity 
is M times higher than a single discrete-time wavelet series. The problem is to find 
the initial sequence which corresponds to the projection of the x m (t) onto Vo- One 
way to do this is given in [300]. 

Finally, one can combine the multivoice with the "a trous" algorithm to compute 
a dense grid over scales as well as time. The complexity then grows linearly with 
the number of octaves and the number of voices, as 

4 • L ■ J ■ M operations/input sample, 

where J and M are the number of octaves and voices respectively. This is an 
obvious algorithm, and there might exist more efficient ways yet to be found. 

This concludes our discussion of algorithms for oversampled expansions, which 
closely followed their counterparts for the critically sampled case. 

6.5 Special Topics 

6.5.1 Computing Convolutions Using Multirate Filter Banks 

We have considered improvements in computing convolutions that appear in filter 
banks. Now, we will investigate schemes where filter banks can be used to speed 
up convolutions. 
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Figure 6.10 Overlap-add algorithm as a filter bank. 

Overlap-Add/Save Computation of Running Convolution When computing 
the linear convolution of an infinite signal with a finite-length filter using fast 
Fourier transforms, one has to segment the input signal into blocks. Assume a 
filter of length L and an FFT of size N > L. Then, a block of signal of length 
N — L + 1 can be fed into the FFT so as to get the linear convolution of the signal 
with the filter. The overlap-add algorithm [32, 209] segments the input signal into 
pieces of length N — L + 1, computes the FFT-based convolution, and adds the 
overlapping tails of adjacent segments (L — 1 outputs spill over to next segments of 
outputs). 

The overlap-save algorithm [32, 209], takes N input samples and computes a 
circular convolution of which N — L + l samples are valid linear convolution outputs 
and L—\ samples are wrap-around effects. These last L—\ samples are discarded, 
the N — L + 1 valid ones kept, and the algorithm moves up by N — L + 1 samples. 

Both of these algorithms have an immediate filter bank interpretation [226] 
which has the advantage of permitting generalizations [317]. We will now focus on 
the overlap-add algorithm. Computing a size-iV FFT with M = N — L + 1 nonzero 
inputs amounts to an analysis filter bank with iV channels and downsampling by 
M. The filters are given by [317] 



H(z) 

Hi{z) 



z M-l + z M-2 + 



+ Z+1, 



-M+l 



■ H (W l N z 



In frequency domain, convolution corresponds to pointwise multiplication by the 
Fourier transform of the filter c[n] given by 



L-l 



c * = ^E^w- 



1=0 
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Finally, the inverse Fourier transform is obtained with upsampling by M followed 
by filtering with an iV-channel synthesis bank where the filters are given by 

G(z) = l + z- l + z- 2 + --- + z- N+ \ 
G t (z) = G{W l N z). 

The algorithm is sketched in Figure 6.10. The proof that it does compute a running 
convolution is simply by identification of the various steps with the usual overlap- 
add algorithm. Note that the system produces a delay of M — 1 samples (since all 
filters are causal), that is 

Y(z) = z- {M - l) C(z)X(z). 

A simple generalization consists in replacing the pointwise multiplications by 
Ci, i = 0, . . . , N — 1, by filters Ci(z), i = 0, . . . , N — 1. Because the system is linear, 
we can use the superposition principle and decompose Ci{z) into its components. 
Call cu the lih coefficient of the ith filter. Now, the set {cjo}, i = 0, . . . ,N — 1 pro- 
duces an impulse response co[n] obtained from the inverse Fourier transform of the 
coefficients Qo- Therefore, because the filters Ci{z) exist in a domain downsampled 
by M, the set {cu} produces an impulse response q[n] which is the inverse Fourier 
transform of cu delayed by / • M samples. 

Finally, if Cj(z) is of degree K, the generalized overlap-add algorithm produces 
a running convolution with a filter of length {K + 1)M when M = L and N = 2M. 
Conversely, if an initial filter c[n] is given, one first decomposes it into segments 
of length M, each of which is Fourier transformed into a set {cu}. That is, a 
length-^ + 1)M convolution is mapped into N size-(K + 1) convolutions, where 
TV is about two times M, and this using size-iV modulated filter banks. The major 
advantage of this method is that the delay is substantially reduced, an issue of 
primary concern in real-time systems. This is because the delay is of the order of 
the downsampling M, while a regular overlap-add algorithm would have a delay of 
the order of (K + 1) • M. 

Table 6.2 gives a comparison of several methods for computing running convolu- 
tion, highlighting the trade-off between computational complexity and input-output 
delay, as well as architectural complexity [317]. 

Short Running Convolution It is well-known that Fourier methods are only 
worthwhile for efficiently computing convolutions by medium to long filters. If 
a filter is short, one can use transposition of the short linear convolution algorithms 
seen in Section 6.1.1 to get efficient running convolutions. For example, the al- 
gorithm in (6.1.4) for 2x2 linear convolution, when transposed, computes two 
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Table 6.2 Computation of running convolution with a length-32 
filter (after [317]). The filter and signal are assumed to be com- 
plex. 



Method 


Delay 


Multiplications 
per point 


Architecture 


(a) Direct 





96 


Simple 


(b) 128-point FFT 
downsampled by 97 


96 


15 


Complex 
(128-pt FFT's) 


(c) 16-point FFT 

downsampled by 8 

and length-4 

channel filters 


7 


29 


Medium 
(16-pt FFT's) 


(d) Same as (c) but 

with efficient 

4-pt convolutions 

in the channel 


31 


18.5 


Medium 

(as (c) plus 

simple short 

convolution algorithms) 



successive outputs of a length-2 filter with impulse response {b\ bo), since 

T 

bo h 
b 6i 




bo 


° \ 


/I 
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6 -6i 
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° 


-1 





h 


Vo 


1 1 



(6.5.1) 



The multiplicative complexity is unchanged at three multiplications/two outputs 
(rather than four), while the number of additions goes up from three to four. 

The same generalization we made for overlap-add algorithms works here as well. 
That is, the pointwise multiplications in (6.5.1) can be replaced by filters in order to 
achieve longer convolutions. This again is best looked at as a filter bank algorithm, 
and Figure 6.11 gives an example of equation (6.5.1) with channel filters instead 
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Figure 6.11 Fast running convolution algorithm with channel filters. The 
input-output relationship equals H tot (z) = z (Hq(z ~) + z~ 1 Hi(z 2 )). 



of pointwise multiplications. After a forward polyphase transform, a polyphase 
matrix (obtained from the rightmost addition matrix in (6.5.1) produces the three 
channel signals. The channel filters are the polyphase components of the desired 
filter and their difference. Then, a synthesis polyphase matrix (the left addition 
matrix from (6.5.1)) precedes an inverse polyphase transform. The transfer matrix 
between forward and inverse polyphase transform is 



T{z) 




H {z) 

H (z)-H 1 (z) 
#i(z) 




H (z) H^z) 
z-^H^z) H (z) 

which is pseudocirculant, as required for a time-invariant system [311]. The above 
T{z) gives the following input-output relationship for the total system 

H tot (z) = z-^Hoiz^ + z^H^z 2 )). 

That is, at the price of a single delay, we have replaced a length- L convolution by 
three length-L/2 convolutions at half-rate, that is, a saving of 25%. This simple 
example is part of a large class of possible algorithms which have been studied 
in [198, 199, 317]. Their attractive features are that they are simple, numerically 
well-conditioned (no approximations are necessary), and the building blocks remain 
convolutions (for which optimized hardware is available). 

6.5.2 Numerical Algorithms 

We will briefly discuss an original application of wavelets to numerical algorithms 
[30] . These algorithms are approximate using exact arithmetic, but arbitrary preci- 
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sion can be obtained. Thus, these are unlike the previous algorithms in this chapter 
which reduced computations while being exact in exact arithmetic. The idea is that 
matrices can be compressed just like images! In applications such as iterative so- 
lution of large linear systems, the recurrent operation is a very large matrix-vector 
product which has complexity TV 2 . If the matrix is the discrete version of an op- 
erator which is smooth (except at some singularities), the wavelet transform 2 can 
be used to "compress" the matrix by concentrating most of the energy into well- 
localized bands. If coefficients smaller than a certain threshold are set to zero, the 
transformed matrix becomes sparse. Of course, we now deal with an approximated 
matrix, but the error can be bounded. Beylkin, Coifman and Rokhlin [30] show 
that for a large class of operators, the number of coefficients after thresholding is 
of order N. 

We will concentrate on the simplest version of such an algorithm. Call W the 
matrix which computes the orthogonal wavelet transform of a length- N vector. Its 
inverse is simply its transpose. If we desire the matrix vector product y = M ■ x, 
we can compute: 

y = W T -(W-M- W T )- W x. (6.5.2) 

Recall that W ■ x has a complexity of order L ■ N, where L is the filter length and 
N the size of the vector. The complexity of W ■ M ■ W is of order L ■ N 2 , and 
thus, (6.5.2) is not efficient if only one product is evaluated. However, if we are in 
the case of an iterative algorithm, we can compute M' = W ■ M ■ W once (at a 
cost of LN 2 ) and then use M' in the sequel. If M' , after thresholding, has order- TV 
nonzero entries, then the subsequent iterations, which are of the form: 

y = W T -M'-W-x, 

are indeed of order N rather than N 2 . It turns out that the computation of M' 
itself can be reduced to an order TV problem [30]. An interpretation of M is of 
interest. Premultiplying M by W is equivalent to taking a wavelet transform of 
the columns of M , while postmultiplying M by W amounts to taking a wavelet 
transform of its rows. That is, M ' is the two-dimensional wavelet transform of M, 
where M is considered as an image. Now, if M is smooth, one expects M' to have 
energy concentrated in some well-defined and small regions. It turns out that the 
zero moments of the wavelets play an important role in concentrating the energy, 
as they do in image compression. This short discussion only gave a glimpse of these 
powerful methods, and we refer the interested reader to [30] and the references 
therein for more details. 



2 Since this will be a matrix operation of finite dimension, we call it a wavelet transform rather 
than a discrete-time wavelet series. 
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Problems 

6.1 Toeplitz matrix-vector products: Given a Toeplitz matrix T of size N x N , and a vector x 
of size N, show that the product Tx can be computed with an order N log 2 N operations. 
The method consists in extending T into a circulant matrix C. What is the minimum size 
of C, and how does it change if T is symmetric? 

6.2 Block circulant matrices: A block-circulant matrix of size NM x NM is like a circulant 
matrix of size N x N, except that the elements are now blocks of size M x M. For example, 
given two M x M matrices A and B, 

A B 

B A 

is a size 2M x 2M block-circulant matrix. Show that block-circulant matrices are block- 
diagonalized by block Fourier transforms of size NM x NM defined as 

Fnm — Fn<8>Im, 

where Fn is the size- AT Fourier matrix, I m is the size-M identity matrix and ® is the 
Kronecker product (2.3.2). 

6.3 The Walsh-Hadamard transform of size 2N (N is a power of 2) is defined as 

Win - W 2 ®W N , 
where 

W 2 = 



1 1 
1 -1 



and ® is the Kronecker product (2.3.2). Derive an algorithm that uses -/Vlog 2 N additions 
for a size-iV transform. 

6.4 Complexity of MUSICAM filter bank: The filter bank used in MUSICAM (see also Sec- 
tion 7.2.3) is based on modulation of a single prototype of length 512 to 32 bandpass filters. 
For the sake of this problem, we assume a complex modulation by W^ 2 k , that is 

h k [n] = h p [n] W£ k , W 32 = e ~^ /32 , 

and thus, the filter bank can be implemented using polyphase filters and an FFT (see 
Section 6.2.3). In a real MUSICAM system, the modulation is with cosines and the imple- 
mentation involves polyphase filters and a fast DCT, thus it is very similar to the complex 
case we analyze here. Assuming an input sampling rate of 44.1 kHz, give the number of 
operations per second required to compute the filter bank. 

6.5 Iterated filters: Consider 

H (t \z) = X\H(z 2k ) i= 1,2,... 

k=a 

and prove the following recursive formulas: 

H {2K \z) - H^ >(z)-H {2 ~ •'(-.-" ). 



H(z) 


H^\z 2 ), 


H{z 2 ' 


" 1 ).^- 1 >(z) 


H (2«- 


1 \z)-H^~^ 
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6.6 Overlap-add/save filter banks: Consider a size-4 modulated filter bank downsampled by 2 
and implementing overlap-add or save running convolution (see Figure 6.10 for example). 

(a) Derive explicitly the analysis and synthesis filter banks. 

(b) Derive the channel coefficients. How long can the time-domain impulse response be if 
the channel coefficients are scalars and the system is LTI? 

(c) Implement a filter with a longer impulse response than found in (b) above by using 
polynomial channel coefficients. Give an example, and verify that the system is LTI. 

6.7 Consider a 3-channel analysis/synthesis filter bank downsampled by 2, with filtering of the 
channels (see Figure 3.18). The filters are given by 



H {z) = z-\ H 1 (z) = l + z~\ H 2 (z) = l 

)(z) — 1 — Z~ , Gl(z) — Z _1 , G2(z) — Z~ — 

C (z) = F {z), C 1 (z)^F {z) + F 1 {z), C 2 (z) = F 1 {z). 



Verify that the overall system is shift-invariant and performs a convolution with a filter 
having the ^-transform F(z) = (F (z 2 ) + z~ 1 Fi(z 2 ))z~ 1 . 



Signal Compression and Subband Coding 



'That which shrinks must first expand. " 
- Lao-Tzu, Tao Te Ching 



_l_ he compression of signals, which is one of the main applications of digital signal 
processing, uses signal expansions as a major component. Some of these expansions 
were discussed in previous chapters, most notably discrete-time expansions via filter 
banks. When the channels of a filter bank are used for coding, the resulting scheme 
is known as subband coding. The reasons for expanding a signal and processing it 
in transform domain are numerous. While source coding can be performed on the 
original signal directly, it is usually more efficient to find an appropriate transform. 
By efficient we mean that for a given complexity of the encoder, better compression 
is achieved. 

The first useful property of transforms, or "generalized" transforms such as sub- 
band coding, is their decorrelation property. That is, in the transform domain, the 
transform coefficients are not correlated, which is equivalent to diagonalizing the 
autocovariance matrix of the signal, as will be seen in Section 7.1. This diagonal- 
ization property is similar to the convolution property (or the diagonalization of 
circulant matrices) of the Fourier transform as we discussed in Section 2.4.8. How- 
ever, the only transform that achieves exact diagonalization, the Karhunen-Loeve 
transform, is usually impractical. Many other transforms come close to exact di- 
agonalization and are therefore popular, such as the discrete cosine transform, or, 
appropriately designed subband or wavelet transforms. The second advantage of 
transforms is that the new domain is often more appropriate for quantization using 

383 
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perceptual criterions. That is, the transform domain can be used to distribute er- 
rors in a way that is less objectionable for the human user. For example, in speech 
and audio coding, the frequency bands used in subband coding might mimic opera- 
tions performed in the inner ear and thus one can exploit the reduced sensitivity or 
even masking between bands. The third advantage of transform coding is that the 
previous features come at a low computational price. The transform decomposition 
itself is computed using fast algorithms as discussed in Chapter 6, quantization in 
the transform domain is often simple scalar quantization, and entropy coding is 
done on a sample-by-sample basis. 

Together, these advantages produced successful compression schemes for speech, 
audio, images and video, some of which are now industry standards (32 Kbits/sec 
subband coding for high-quality speech [192], AC [34, 290], PAC [147], and MUSI- 
CAM for audio [77, 279], JPEG for images [148, 327], MPEG for video [173, 201]). 

It is important to note that the signal expansions on which we have focused so far 
are only one of the three major components of such compression schemes. The other 
two are quantization and entropy coding. This three part view of compression will 
be developed in detail in Section 7.1, together with the strong interaction that exists 
among them. That is, in a compression context, there is no need for designing the 
"ultimate" basis function system unless adequate quantization and entropy coding 
are matched to it. This interplay, while fairly obvious, is often insufficiently stressed 
in the literature. Note that this section is a review and can be skipped by readers 
familiar with basic signal compression. 

Section 7.2 concentrates on one-dimensional signal compression, that is, speech 
and audio coding. Subband methods originated from speech compression research, 
and for good reasons: Dividing the signal in frequency bands imitates the human 
auditory system well enough to be the basis for a series of successful coders. 

Section 7.3 discusses image compression, where transform and subband/ 
wavelet methods hold a preeminent position. It turns out that representing images 
at multiple resolutions is a desirable feature in many systems using image compres- 
sion such as image databases, and thus, subband or wavelet methods are a popular 
choice. We also discuss some new schemes which contain wavelet decompositions 
as a key ingredient. 

Section 7.4 adds one more dimension and discusses video compression. While 
straight linear transforms have been used, they are outperformed by methods using 
a combination of motion based modeling and transforms. Again, a multiresolution 
feature is often desired and will be discussed. 

Section 7.5 discusses joint source-channel coding using multiresolution source 
decompositions and matched channel coding. It turns out that several upcom- 
ing applications, such as digital broadcasting and transmission over highly varying 
channels such as wireless channels or channels corresponding to packet-switched 
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(a) 



(b) 




%-i 



Figure 7.1 Compression system based on linear transformation. The linear 
transform T is followed by quantization (Q) and entropy coding (E). The 
reconstruction is simply x = T~ y. (a) Global view, (b) Multichannel case 
with scalar quantization and entropy coding. 



transmission, are improved by using multiresolution techniques. 



7.1 COMPRESSION SYSTEMS BASED ON LINEAR TRANSFORMS 

In this section, we will deal with compression systems, as given in Figure 7.1(a). 
The linear transformation (T) is the first step in the process which includes quan- 
tization (Q) and entropy coding (E). Quantization introduces nonlinearities in the 
system and results in loss of information, while entropy coding is a reversible pro- 
cess. A system as given in Figure 7.1 is termed an open-loop system, since there 
is no feedback from the output to the input. On the other hand, a closed-loop 
system, such as the DPCM (see Figure 7.5), includes the quantization in the loop. 
We mostly concentrate on open-loop systems, because of their close connection 
to signal expansions. Following Figure 7.1, we start by discussing various linear 
transforms with an emphasis on the optimal Karhunen-Loeve transform, followed 
by quantization, and end up briefly describing entropy coding methods. We try to 
emphasize the interplay among these three parts, as well as indicate the importance 
of perceptual criterions in designing the overall system. Our discussion is based on 
the excellent text by Gersho and Gray [109], to which we refer for more details. 



386 CHAPTER 7 

This chapter uses results from statistical signal processing, which are reviewed in 
Appendix 7.A. 

Let us here define the measures of quality we will be using. First, the mean 
square error (MSE), or, distortion, equals 



N-l 

N 



i) = ^E^i^-^i 2 )' ( 7 -°) 



where X{ are the input values and x\ are the reconstructed values. For a zero-mean 
input, the signal-to-noise ratio (SNR) is given by 



a 2 



SNR = 101og 10 — , (7.1.2) 

where D is as given in (7.1.1) and a 2 is the input variance. The peak signal-to-noise 
ratio (SNRp) is defined as [138] 



M 2 
D 



SNR P = 101og 10 — , (7.1.3) 



where M is the maximum peak-to-peak value in the signal (typically 256 for 8- 
bit images). Distortion measures based on squared error have shortcomings when 
assessing the quality of a coded signal such as an image. An improved distortion 
measure is a perceptually weighted mean square error. Even better are distortion 
models which include masking. These distortion metrics are signal specific, and 
some of them will be discussed in conjunction with practical compression schemes 
in later sections. 

7.1.1 Linear Transformations 

Assume a vector x[n] = (x[n\, x[n + 1], . . . x[n + N — 1]) T of N consecutive samples 
of a real wide-sense stationary random process (see Appendix 7. A). Typically, these 
samples are correlated and independent coding of the samples is inefficient. The idea 
is to apply a linear transform 1 so that the transform coefficients are decorrelated. 
While there is no general formal result that guarantees more efficient compression 
by decorrelation, it turns out in practice (and for certain cases in theory) that scalar 
quantization of decorrelated transform coefficients is more efficient than direct scalar 
quantization of the samples. 

Since we assumed that the process is wide-sense stationary and we will be dealing 
only with the second-order statistics, we do not need to keep the index n for x[n] 



lr This can also be seen as a discrete-time series expansion. However, since it is usually imple- 
mented as a matrix block transform we will adhere to the compression literature's convention and 
call it a transform. 
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and can abbreviate it simply as x. From now on, we will assume that the process 
is zero-mean and thus its autocorrelation and autocovariance are the same, that is, 
K[n,m] = R[n,m\. The autocovariance matrix of the input vector x is 

K x = E(x-x T ). 

Again, since the process is wide-sense stationary and zero-mean, K[n,m] = K[n — 

m] = R[n — m] (see Appendix 7. A). Therefore, the matrix K x has the following 

form: 

/ R[0] R[l] . . . R[N - 1] \ 

R[l] R[0] ... R[N-2] ' 



K x 



\R[N-\\ R[N-2] ... R[0] J 



This matrix is Toeplitz, symmetric (see Section 2.3.5), and nonnegative definite 
since all of its eigenvalues are greater or equal to zero (this holds in general for 
autocorrelation matrices). Consider now the transformed vector y, 

V = Tx, (7.1.4) 

where T is an N x N unitary matrix which thus satisfies T T = TT = I. Then 
the autocovariance of y is 

K y = E(yy T ) = E(Txx T T T ) = TE(xx T )T T 

= TK X T T . (7.1.5) 

Karhunen-Loeve Transform We would like to obtain uncorrelated transform co- 
efficients. Recall that for each two coefficients to be uncorrelated, their covariance 
has to be zero (see Appendix 7. A). Thus, we are looking for a diagonal K y . For 
that to hold, T has to be chosen with its rows equal to the eigenvectors of K x . Call 
Vi the eigenvector (normalized to unit norm) of K x associated with the eigenvalue 
Aj, that is, K x Vi = XiVi, and choose the following ordering for the Aj's: 

A > Ai > ••• > Ajv_i > 0, (7.1.6) 

where the last inequality holds because K x is nonnegative definite. Moreover, since 
K x is symmetric, there is a complete set of orthonormal eigenvectors (see Section 
2.3.2). Take T as 

T = [v Q v 1 ... ^-if, (7.1.7) 

then, from (7.1.5), 

K y = T-K X -T T = T-T T -A = A, (7.1.8) 
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where A is a diagonal matrix with Ajj = Aj = af = yf, i = 0, . . . , N — 1. The 
transform defined in (7.1.7) which achieves decorrelation as shown in (7.1.8) is 
the discrete-time Karhunen-Loeve (KLT) or Hotelling transform [109, 138]. The 
following approximation result is intuitive: 

Proposition 7.1 

If only k out of the N transform coefficients are kept, then the coefficients 
yo, ... , Hk-i will minimize the MSE between x and its approximation x. 

Although the proof of this result follows from general orthonormal expansions re- 
sults given in Chapter 2, we describe it here for completeness. 

Proof 

Following (7.1.1), the MSE is equal to 

D = El^ixi-Xi)'] = E((x - xf ■ (x - x)) = E((y-yf -(y-y)), (7.1.9) 

where the last equality follows from the fact that T is a unitary transform, that is, the MSE 
is conserved between transform and original domains. Keeping only the first k coefficients 
means that j/» = y, for i — 0, . . . , k — 1 and j/» = 0, for i = k, . . . , N — 1. Then the MSE 
equals 

/N-l \ AT-1 N-l 

and this is obviously smaller or equal to any other set of N — k coefficients because of the 
ordering in (7.1.6). Recall here that the assumption of zero mean still holds. 

Another way to say this is that the first k coefficients contain most of the energy 
of the transformed signal. This is the "energy packing" property of the Karhunen- 
Loeve transform. Actually, among all unitary transforms, the KLT is the one that 
packs most energy into the first k coefficients. 

There are two major problems with the KLT, however. First, the KLT is signal 
dependent, since it depends on the autocovariance matrix. Second, it is computa- 
tionally complex, since no structure can be assumed for T, and no fast algorithm 
can be used. This leads to an order ./V 2 operations for applying the transform. 

Discrete Cosine Transform Due to the discussed problems, various approxima- 
tions to the KLT have been proposed. These approximations usually have fast 
algorithms for efficient implementation. The most successful is the discrete cosine 
transform (DCT), which calculates the vector y from x as 

! N ~! 
Vo = -7=f ^2 x n , (7.1.10) 

V7V n=0 
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V° = A/ TV ^Z^ cos ( 4jV )' fc = 1 '---. JV - 1 - ( 7 - 1 - 11 ) 

71=0 ^ ' 

The DCT was developed [2] as an approximation for the KLT of a first-order Gauss- 
Markov process with a large positive correlation coefficient p {p — > 1). In this case, 
l^ x is of the following form (assuming unit variance and zero mean) 





1 


P P 2 


P 3 -1 


K x = 


P 

P 2 


1 P 

P 1 


p 2 ... 
p ... 











For large p's, the DCT approximately diagonalizes K x . Actually, the DCT (as well 
as some other transforms) is asymptotically equivalent to the KLT of an arbitrary 
wide-sense stationary process when the block size N tends to infinity [294]. It 
should be noted that even if the assumptions do not hold exactly (images are not 
first-order Gauss-Markov), the DCT has proven to be a robust approximation to 
the KLT, and is used in several standards for speech, image and video compression 
as we shall see. 

The DCT also has shortcomings. One must block the input stream in order to 
perform the transform and this blocking is quite arbitrary. The block boundaries 
often create not only loss of compression (correlation across the boundaries is not 
removed) but also annoying blocking effects. This is one of the reasons for using 
lapped transforms and subband or wavelet coding schemes. However, the goal of 
these generalized transforms is the same, namely, to create decorrelated outputs 
from a correlated input stream, and then to quantize the outputs separately. 

Discussion We recall that decorrelation leads to independence only if the input is 
Gaussian (see Appendix 7. A). Also, even independent random variables are better 
quantized as a block (or as a vector) than as independent scalars, due to sphere 
packing gains (see discussion of vector quantization in Section 7.1.2). However, the 
complexity of doing so is high, and thus, scalar quantization is often preferred. It 
will be shown below, after a discussion of quantization and bit allocation, that the 
KLT is the optimal linear transformation (under certain assumptions) among block 
transforms. The performance of subband coding will also be analyzed. 

The major point is that all these schemes are unitary transformations on the 
input and thus, if x and y are the approximate versions of x and y, respectively, 
we always have (similarly to (7.1.9)) 

|| x — x || = ||y — y\\- (7.1.12) 
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Figure 7.2 Uniform scalar quantizer with N = 7 and A = 1. The deci- 
sion levels {xi} are { — 5/2, —3/2, —1/2, 1/2, 3/2, 5/2} and the outputs {yj} are 
{-3,-2,-1,0,1,2,3}. 



Note that nonorthogonal systems (such as linear phase biorthogonal filter banks) 
are usually designed to almost satisfy (7.1.12). If they do not, there is a risk that 
small errors in the transform domain are magnified after reconstruction. The key 
problem now is to design the set of quantizers so as to minimize -E(||t/ — y||). 

7.1.2 Quantization 

While we deal with discrete-time signals in this chapter, the sample values are real 
numbers, that is, continuously distributed in amplitude. In order to achieve com- 
pression, we need to map the real value of samples into a discrete set, or discrete 
alphabet. This process of mapping the real line into a countable discrete alphabet 
is called quantization. In practical situations, the sample values are mapped into 
a finite alphabet. An excellent treatment of quantization can be found in [109]. 
In its simplest form, each sample is individually quantized, which is called scalar 
quantization. A more powerful method consists in quantizing several samples at 
once, which is referred to as vector quantization. Also, one can quantize the differ- 
ence between a signal and a suitable prediction of it, and this is called predictive 
quantization. We would like to stress here that the results on optimal quantization 
for a given signal are well-known, and can be found in [109, 143]. 

Scalar Quantization An example of a scalar quantizer is shown in Figure 7.2. 
The input range is divided into intervals /j = (xj_i,Xj] (a partition of the real 
line) and the output value yi is typically chosen in the interval I\. The set {yi\ is 
called the codebook and yi the codewords. For the simple, uniform quantizer shown 
in Figure 7.2, the intervals are of the form (i — 1/2, i + 1/2] and yi = i. Note 
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that the number of intervals is finite. Thus, there are two unbounded intervals 
which correspond to what is called "overload" regions of the quantizer, that is, for 
x < —5/2 and x > 5/2. Given that the number of intervals is N, there are N 
output symbols. Thus, R = |~log 2 N~\ bits are needed to represent the output of 
the quantizer, and this is called the rate. The operation of selecting the interval is 
sometimes called coding, while assigning the output value in for the interval I{ is 
called decoding. Thus, we have a two-step process 

\X%— liXi\ > % > yi. 

coder decoder 

The performance of a quantizer is measured as the distance between the input and 
the output, and typically, the squared error is used: 



d(x,x) = \x — x 



2 



Given an input distribution, worst case or more often average distortion is measured. 
Thus, the MSE is 

D = E{\x-x\ 2 ) = Y.j (x-y t ) 2 f x (x)dx, (7.1.13) 

where fx{x) is the probability density function (pdf) of x. For example, assume a 
uniform input pdf and a bounded input with N intervals, then uniform quantization 
with intervals of width A and y-i = (xj + Xi-i)/2 leads to an MSE equal to 

A 2 
D = — . (7.1.14) 

The derivation of (7.1.14) is left as an exercise (see Problem 7.1). The error due to 
quantization is called quantization noise: 

e[n] = x[n] — x[n], 

if x and x are the input and the output of the quantizer, respectively. While e[n] 
is a deterministic function of x[n], it is often modeled as a noise process which is 
uncorrelated to the input, white and with a uniform sample distribution. This is 
called an additive noise model, since x[n] = x[n] + e[n]. While this is clearly an 
approximation, it is a fair one in the case of high-resolution uniform quantization 
(when A is much smaller than the standard deviation a of the input signal and N 
is large). 

Uniform quantization, while not optimal for nonuniform input pdf's, is very 
simple and thus often used in practice. One design parameter, besides the quanti- 
zation step A, is the number of intervals, or the boundaries which correspond to the 
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overload region. Usually, they are chosen as a multiple of the standard deviation a 
of the input pdf (typically, 4 a away from the mean). Given constant boundaries 
a and b, then A = (b — a)/N . Thus, A decreases as 1/N = 1/2 where R is the 
number of bits of the quantizer. The distortion D is of the form (following (7.1.14)) 

D = ^ = ^ff- = * 2 2- 2R = C2-™, (7.1.15) 

since a 2 = (b — a) 2 /12 for uniform input pdf. In general, C is a function of a 2 and 
depends on the distribution. This means that the SNR goes up by 6 dB for every 
additional bit in the quantizer. To see that, add a bit to R, R' = R + 1. Then 

D' = C-2- 2 ( R+1 ) = C-2~ 2R -2- 2 . 

The new SNR' equals (use (7.1.2)) 

2 

SNR' = 101og 10 4 ° 2R = SNR+ 10 log 10 4 ~ SNR + 6 dB. 

When the pdf is not uniform, optimal quantization will not be uniform either. An 
optimal MSE quantizer is one that minimizes D in (7.1.13) for a given number 
of output symbols N. For a quantizer to be MSE optimal, it has to satisfy the 
following two necessary conditions [109]: 

(a) Nearest neighbor condition For a given set of output levels, the optimal parti- 
tion cells are such that an input is assigned to the nearest output level. For 
MSE minimization, this leads to the midpoint decision level between every two 
adjacent output levels. 

(b) Centroid condition Given a partition of the input, the optimal decoding lev- 
els with respect to the MSE are the centroids of the intervals, that is, yi = 
E{x | x £ Ii). 

Note that such a quantizer is not necessarily optimal for compression since it 
does not take into account entropy coding. 2 The two conditions are sketched in 
Figure 7.3. Both conditions are intuitive, and can be used to verify optimality of a 
quantizer or actually design an optimal one. This is done in the Lloyd algorithm, 
which iteratively improves a codebook for a given pdf and a number of codewords 
N (the pdf can be given analytically or through measurements). Starting with some 
initial codebook {y\ }, it alternates between 



2 A suitable modification, called entropy constrained quantization, takes entropy into account 
in the design of the quantizer. 
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Figure 7.3 Optimality conditions for scalar quantizers, (a) Nearest neighbor 
condition, (b) Centroid condition. 
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(a) Given {y\ }, find the partition {x\ }, based on the nearest neighbor condi- 
tion. 



»i 



fn+l) 



(b) Given {x\ }, find the next {y\ }, satisfying the centroid condition. 

and stops when D^ n > is only marginally improved. The resulting quantizer is called 
a Lloyd-Max quantizer. 

The above discussion assumed quantization of a continuous variable into a dis- 
crete set. Often, a discrete input set of size M has to be quantized into a set of 
size N < M. A "discrete" version of the Lloyd algorithm, which uses the same 
necessary conditions (nearest neighbor and centroid), can then be used. 

While the above method yields quantizers with minimum distortion for a given 
codebook size, entropy coding was not considered. We will see that if entropy 
coding is used after quantization, a uniform quantizer can actually be attractive. 



Vector Quantization While vector quantization (VQ) [109, 120] is much more 
than just a generalization of scalar quantization to multiple dimensions, we will 
only look at it in this restricted way in our brief treatment. Figure 7.4(a) shows a 
regular vector quantizer for a two-dimensional variable. Note that the partition of 
the square is into convex 3 regions and the separation into regions is performed using 
straight lines (in N dimensions, these would be hyperplanes of dimension N — 1). 
There are several advantages of vector quantizers over scalar quantizers. For the 
sake of discussion, we consider a two-dimensional case, but it obviously generalizes 
to N dimensions. 



3 Convex means that if two points x and y belong to one region, then all the points on the 
straight line connecting x and y will belong to the same region as well. 
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Figure 7.4 Vector quantization, (a) Example of a regular vector quantizer in 
two dimensions, (b) Comparison of scalar and vector quantizations. On the 
left, a two-dimensional probability density function is shown. It equals 2 in 
shaded areas and otherwise. Note that xq and x\ have uniform marginal 
distributions. For a given distortion, in the middle, optimal scalar (separa- 
ble) quantization is shown, with 4.0 bits, or, 2.0 bits/sample. For the same 
distortion, on the right, vector quantization is shown, with 3.0 bits, or, 1.5 
bits/sample. 



(a) Packing gain Even if two variables are independent, there is gain in quantizing 
them together. The reason is that there exist better partitions of the space 
then the rectangular partition obtained when we separately scalar quantize 
each variable. For example, in two dimensions, it is well-known that hexagonal 
tiling achieves a smaller MSE than the square tiling for the quantization of 
uniformly distributed random variables, given a certain density. The packing 
gain increases with dimensionality. 

(b) Removal of linear and nonlinear dependencies While linear dependencies could 
be removed using a linear transformation, VQ also removes nonlinear depen- 
dencies. To see this, let us consider the classic example shown in Figure 7.4(b). 
The two-dimensional probability density function equals 2 in shaded areas and 
otherwise. Because the marginal distributions are uniform, scalar quantiza- 
tion of each variable is uniform. Vector quantization "understands" the de- 
pendency, and only allocates partitions where necessary. Thus, instead of 4.0 
bits, or, 2.0 bits/sample for the scalar quantization, we obtain 3.0 bits, or, 1.5 
bits/sample for the vector quantization, reducing the bit rate by 25% while 
keeping the same distortion (see Figure 7.4(b)). 

(c) Fractional bit rate At low bit rates, choosing between 1.0 bits/sample or 2.0 
bits/sample is a rather crude choice. By quantizing several samples together 
and allocating an integer number of bits to the group, fractional bit rates can 
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Figure 7.5 Predictive quantization, (a) Open-loop linear predictive quantiza- 
tion, (b) Closed-loop predictive quantization or differential pulse code modu- 
lation (DPCM). 



be obtained. 



For a vector quantizer to be MSE optimal, it has to satisfy the same two con- 
ditions we have seen for scalar quantizers, namely: 

(a) The nearest neighbor condition. 

(b) The centroid condition. 

A codebook satisfying these two necessary conditions is locally optimal (small per- 
turbations will not decrease D) but is usually not globally optimal. The design 
of VQ codebooks is thus a sophisticated technique, where a good initial guess is 
crucial and is followed by an iterative procedure. For escaping local minimums, 
stochastic relaxation is used. For details, we refer to [109]. 

A drawback of VQ is its complexity, which limits the size of vectors that can 
be used. One solution is to structure the codebook so as to simplify the search of 
the best matching vector, given the input. This is achieved with tree-structured 
VQ. Another approach is to use linear transforms (including subband or wavelet 
transforms) and apply VQ to the relevant transform coefficients. Finally, lattice VQ 
uses multidimensional lattices as a partition, allowing large vectors with reasonable 
complexity, since lattice VQ is the equivalent of uniform quantization in multiple 
dimensions. 
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Predictive Quantization An important and useful technique is when, instead 
of quantizing the samples x[n] of the signal to be compressed, one quantizes the 
difference between a prediction x[n] and x[n], or d[n] = x[n] — x[n] [109, 143]. 
Obviously, if the prediction is accurate, d[n] will be small. In other words, for a given 
number of quantization levels, the quantization error will decrease as compared to 
a straight quantization of x[n]. Prediction is usually linear and based on a finite 
number of past samples. An example is shown in Figure 7.5(a), where P(z) is 
a strictly causal filter, P{z) = a\z _1 + a2Z~ 2 + • • • + a^z~ . That is, x[n] is 
predicted based on a linear combination of L past samples, {x[n — 1], ... , x[n — L]}. 
Furthermore, 1 — P{z) is chosen to be minimum phase so that its inverse, used 
in the decoder, is a stable filter. Given a predictor order and a stationary input 
signal, the best linear prediction filter that minimizes the variance of d[n] is found 
by solving a set of linear equations involving the autocorrelation matrix of the signal 
(the Yule- Walker equations). 

An interesting alternative is closed-loop predictive quantization or differential 
pulse code modulation (DPCM), as shown in Figure 7.5(b). In the absence of 
quantization, DPCM is equivalent to the open-loop predictive quantization in Fig- 
ure 7.5(a). An important feature here is that since we are predicting x[n] based on 
its past quantized values x g [A;], k = n — L, . . . ,n— 1, we can generate the same x[n] 
at the decoder side from these past values &«[&]. The idea is that in the decoder, 
we can add back exactly what was subtracted in the encoder and thus, the error 
made on the signal is equal to the error made when quantizing the difference signal. 
In other words, since 

d[n] = x[n] — x q [n], 

and 

y[n] = d q [n] + x q [n], 

we get that 

E(\x[n}-y[n}\ 2 ) = E{ \d[n] - d q [n]\ 2 ), 

where x[n] and y[n] are the input and output of the DPCM, while d[n] and d q [n] 
are the prediction error and its quantized version, respectively. 

An important figure of merit of the above closed-loop predictive quantization 
is the closed-loop prediction gain. It is defined as the ratio of the variances of the 
input and of the prediction error, 



G 






Note that when the quantization is coarse, this can be quite different from the 
open-loop prediction gain, which is the equivalent relation but with the prediction 
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as in Figure 7.5(a). For practical reasons, the predictor P(z) in the closed- loop 
case is usually chosen as in the open-loop case, that is, we are using the predicted 
coefficients that are optimal for the true past L samples of the signal. 

A further improvement involves adaptive prediction, and can be used both in 
the open-loop and in the closed- loop cases. The predictor is updated every K 
samples based on the local signal characteristics and sent to the decoder as side 
information. 

Linear predictive quantization is used successfully in speech and image com- 
pression (both in the open-loop and closed- loop forms). In video, a special form of 
adaptive DPCM, over time, involves motion-based prediction called motion com- 
pensation, which is discussed in Section 7.4.2. 

Bit Allocation Looking back at the transform coding diagram in Figure 7.1, the 
obvious question is: How do we choose the quantizers for the various transform 
coefficients? This is a classical resource allocation problem, where one tries to 
maximize (or minimize) a cost function which describes the quality of approximation 
under the constraint of finite resources, that is, a given number of bits that can be 
used to code the signal. Let us first recall an important fact: The total squared 
error between the input and the output is the sum of individual errors because the 
transform is unitary. To see that, call x and x the input and reconstructed input, 
respectively. Then y and y will be the input and the output of the quantizer. That 
is, 

y = Tx, x = T T y, 

where the last equation holds since the transform T is unitary, that is, T T = 
TT T = I. Then the total distortion is 

D = E((x - xf ■ (x - x)) = E{{y-y) T ■ TT T ■ {y - y)) 

/N-l \ JV-1 

= E((y-y) T .(y-y)) = E K^-y,) 2 = £ Di , 

V i=0 / i=0 

where Di is the expected squared error of the ith coefficient. Then, the bit allocation 

problem is to minimize 

JV-l 

D = ^ D u (7.1.16) 

i=0 

while satisfying the bit budget 

JV-l 



J2 R i ^ R > ( 7 - L17 ) 



i=0 
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Figure 7.6 Rate distortion and bit allocation, (a) Rate-distortion curve for 
a statistically described source (solid line) and an operational rate-distortion 
curve (dashed line) based on a set of quantizers, (b) Constant-slope solution 
for an optimal allocation between two sources having the above rate-distortion 
curves. 



where R is the total budget and Ri the number of bits allocated to the ith coefficient. 
A dual situation appears when a maximum allowable distortion is given and the 
rate has to be minimized. Before considering specific allocation procedures, we will 
discuss some aspects of optimal solutions. 

The fundamental trade-off in quantization is between rate (number of bits used) 
and distortion (approximation error) and is formalized as rate-distortion theory 
[28, 121]. A rate-distortion function for a given source specified by a statistical 
model precisely indicates the possible trade-off. While rate-distortion bounds are 
usually not closely met in practice, implementable systems have a similar behavior. 
Figure 7.6(a) shows a possible rate-distortion function as well as points reached by 
a practical system (called an operational rate-distortion curve). Note that the true 
rate-distortion function is convex, while the operational one is not necessarily. 

For example, for high-resolution scalar quantization, the distortion D{ is related 
to the rate Ri as (see (7.1.15)) 



Di(Ri 



Ct 



~ 2 9 



-2Ri 



(7.1.18) 



where C{ is a constant depending on the pdf of the quantized variable (for example, 
in the case of a zero- mean Gaussian variable, Cj = v3tt/2). 

Returning to our initial problem as stated in (7.1.16) and (7.1.17), we will con- 
sider a two- variable case for illustration. Assume we separately code two variables 
xq and x±, each having a given rate-distortion function. A key property we as- 
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sume is that both rate and distortion are additive. This is, for example, the case 
in transform coding if the coefficients are independent. How shall we allocate bits 
to each variable so as to minimize distortion? It is important to note that in a 
rate-distortion problem, we have to consider both rate and distortion in order to 
be optimal. Since the two dimensions are not related (one is bits and the other is 
MSE), we use a new cost function L combining the two through a positive Lagrange 
multiplier A: 

L = D + \R, 
Li = Di + X-Ri, i = 0,l, 

where L = Lq + L\. Finding a minimum of L (which now depends on A) amounts to 
finding minimums for each Li (because the costs are additive). Writing distortion 
as a function of rate, Dj(i?j), and taking the derivative to find a minimum, we get 

dU _ dDj{Ri) 
dRi dRi + 

that is, the slope of the rate-distortion function is equal to —A, for i — 0, 1 and 
<9-Do(-Ro)/<9i?o — dD\{R\)/dRi = —A. Uniqueness follows from the convexity of 
the rate-distortion curves. Thus, for a solution to be optimal, the set of chosen 
rates Rq and R\ have to correspond to constant-slope points on their respective 
rate-distortion curves [262], as shown in Figure 7.6(b). This solution is also very 
intuitive. Consider what would happen if {Rq, Dq), (R±, D\) did not have the same 
slope, and suppose that Ao is much steeper than Ai. We assume we are within the 
budget R, that is, R = Rq + R±. Increase now the rate Rq by e. Since we need to 
stay within the budget, we have to decrease the rate of R\ by the same amount. In 
the process, we have decreased the distortion Dq and increased the distortion D\. 
However, since we assumed that the first slope is steeper, it actually paid off to do 
this since we remained with the same budget while decreasing the overall distortion. 
Repeating the process, we move closer and closer to the optimal solution. Once we 
reach the point where both slopes are the same, we do not gain anything by moving 
further. 

A constant-slope solution is obtained for any fixed value of R. To enforce the 
constraint (7.1.17) exactly, one has to search over all slopes A until the budget is 
met and then we have an optimal solution that satisfies the constraints. In practice, 
the exact functions Di{Ri) might not be known, but one can still use similar ideas 
on operational rate-distortion curves [262]. The main point of our discussion was 
to indicate the philosophy of the approach: Based on rate-distortion curves, find 
operating points that satisfy an optimality criterion and search until the budget 
constraint is satisfied as well. 
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When high-resolution quantization approximations can be used, it is possible 
to give closed-form allocation expressions. Assume the N sources have the same 
type of distribution but different variances. Then Dj(i?j) is given in (7.1.18) with 
a fixed constant Cj = C. Taking the derivative, it follows that: 

dDi(Ri) . 2 2 Ri 

with C = —2 In 2 • C. The constant-slope solution, that is, dDi{Ri)/dRi = —A, 
forces the rates to be of the following form: 

Ri = a + \og 2 Oi- 

Since we also have the budget constraint (7.1.17), 

JV-1 



J^Ri = N -a+Y^ log 2 *i = R, 



i=0 



we find 



and 



R 1 ^ 

i=0 



R 1 N ~ l 

Ri = -t-j + log 2 Oi - — V log 2 <Ji = i? + log 2 — , (7.1.19) 

i=o r 

where R = R/N is the mean rate and p is the geometric mean of the variances 

'"IS") ■ 

Note that each quantizer has the same average distortion 

D t = C-af2- 2R < = c • <7?2- 2 (* +1o *Wp) 

= c ■a 2 i -2- 2R 2 2Xo ^ p l (T ^ = C-p 2 -2~ 2R . (7.1.20) 

The result of this allocation procedure is intuitive, since the number of quantization 
levels allocated to the ith quantizer, 

= — • 0i, 
p 
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is simply proportional to the standard deviation or spread of the variable X{. The 
allocation (7.1.19) can be modified for nonidentically distributed random variables 
and weighted errors (the ith error is weighted by Wi in the total distortion). In 
this case af, in the allocation problem, is replaced by C, • Wi ■ af, leading to the 
appropriate modification of (7.1.19). 

The problem with the above allocation procedure is that the resulting rates are 
noninteger and even worse, small variances can lead to negative allocations. Both 
problems can be tackled by starting with the solution given by (7.1.19) and forcing 
nonnegative integer allocations (this might lead to slight suboptimality, however). 

The next algorithm [109] tackles the problem directly by allocating one bit at 
a time to the quantizer where it is most needed. It is a "greedy" algorithm and 
not optimal, but leads to good solutions. Call Ri[n] the number of bits allocated 
to quantizer i at the nth iteration of the algorithm. Then, the algorithm iterates 
over n until all bits have been allocated and at each step, allocates the next bit to 
the quantizer j which has maximum distortion with the current allocation, 

Dj(Rj[n]) > Di(Ri[n}), i^j. 

That is, the next bit is allocated to where it is most needed. Since Di can be given 
in analytical form or measured on a training set, this algorithm is easily applicable. 
More sophisticated algorithms, optimal or near optimal, are based on Lagrange 
methods applied to arbitrary rate-distortion curves [262]. 

Coding Gain Now that we have discussed quantization and bit allocation, we 
can return to our study of transform coding and see what advantage is obtained by 
doing quantization in the transform domain (see Figure 7.1). 

First, recall that the Karhunen-Loeve transform leads to uncorrelated variables 
with variance Aj (see (7.1.8)). Assume that the input to the transform is zero- mean 
Gaussian with variance a^, and that fine quantization is used. This leads us to 
Proposition 7.2. 

PROPOSITION 7.2 Optimality of Karhunen-Loeve Transform 

Among all block transforms and at a given rate, the Karhunen-Loeve trans- 
form will minimize the expected distortion. 



Proof 



After the KLT with optimal scalar quantization and bit allocation, the total distortion for 
all N channels is (following (7.1.20)), 

/N-l \!/~ 

£>klt = N ■ C ■ T 2R ■ p 2 = N ■ C ■ 2~ 2k (y[ Ail , (7.1.21) 
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where C — VStt/2 (see (7.1.18)). Since the determinant of a matrix is equal to the product 
of its eigenvalues, the last term is equal to (det(.K'as)) 1 ^ JV where K x is the autocovariance 
matrix (assuming zero mean, K x — R x ). To prove the optimality of the KLT, we need 
the following inequality for the determinant of an autocorrelation matrix of N zero-mean 
variables with variances of [109]: 

JV-l 

det(R x ) < J] CT » 2 . (7.1.22) 

with equality if and only if R x is diagonal. It turns out that the more correlated the 
variables are, the smaller the determinant. 

Consider now an arbitrary orthogonal transform, with transform variables having 
variance o~ i . The distortion is 

/N-l \ VJV 

D T = N -C ■ 2~ 2k I \\ en 

Because of (7.1.22) and the fact that the determinant is conserved by unitary transforms, 
this is greater or equal than 

D T > N -C-2~ 2R det{R x ) 1/N . 

Since the KLT achieves a diagonal R x , then the equality is reached by the KLT following 
(7.1.21). This proves that if the input to the transform is Gaussian and the quantization is 
fine, the KLT is optimal among all unitary transforms. 

What is the gain we just obtained? If the samples are directly quantized, the 
distortion will be 

Dpcm = N-C-2~ 2R -a 2 x , (7.1.23) 

(where PCM stands for pulse code modulation, that is, sample-by-sample quanti- 
zation) and the coding gain due to optimal transform coding is 

groM = oj = I-WES't? n , ,,, 

where we used the fact that N • a 2 , = ^ af. Recalling that the variances af are 
the eigenvalues of R x , it follows that the coding gain is the ratio of the arithmetic 
and geometric means of the eigenvalues of the autocorrelation matrix (under the 
zero-mean assumption). The lower bound on the gain is 1, which is attained only 
if all eigenvalues are identical. 

Subband coding, being a generalization of transform coding, has a similar be- 
havior. If the input is Gaussian, the channel signals are Gaussian as well. If the 
filters are ideal bandpass filters, the channels will be decorrelated. In any case, the 
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distortion resulting from optimally allocating R = N ■ R bits across N channels 
with variances af is, as in the usual transform case 

Dsbc = N-C-2~ 2R -p 2 , 

where p is the geometric mean of the subband variances. Using (7.1.23) for direct 
quantization we get, similarly to (7.1.24), the subband coding gain as 

£pcm __ l/NEto 1 * 2 



Dsbc f-n 1 ^- 1 „2\ 



l/JV' 

■■) 

where the a 2, s are the subband variances. That is, if the spectrum is far from 
being flat, there will be a large coding gain in subband methods. This is to be 
expected, since it becomes possible to match the spectral characteristics of the 
signal very closely, unlike in a sample-domain quantization. It is worthwhile to note 
that when the number of channels grows to infinity, both transform and subband 
coding achieve the theoretical performance of predictive coding with infinitely long 
predictor [143]. 

The obvious question is of course how do transform and subband coding com- 
pare? The ratio of -Dklt and -Dsbc i s: 

-°klt = Pklt 
Dsbc p| BC ' 

that is, the one with the smaller geometric mean wins. Qualitatively, the one with 
the larger spread in variances will achieve better coding gain. The exact comparison 
thus requires measurements of variances in specific transforms (such as the DCT) 
versus filter banks (of finite length rather than ideal ones). 

While the above considerations use some idealized assumptions, the concept 
holds true in general: The wider the variations between the component signals 
(transform coefficients or subbands), the higher the potential for coding gain. More 
about the above can be found in [5, 220, 273, 292, 295]. 

7.1.3 Entropy Coding 

The last step in transform coding as shown in Figure 7.1 is entropy coding. Simi- 
larly to the first step, it is reversible and thus, there is no approximation problem 
as in quantization. After quantization, the variables take values drawn from a fi- 
nite set {ai}. The idea is to find a reversible mapping M to a new set {&»} such 
that the average number of bits/symbol is minimized. A historical example is the 
Morse code which assigns short codes to the letters that appear frequently in the 
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English language while reserving long codes to less frequent ones. The parameters 
in searching for the mapping M are the probabilities of occurrence of the symbols 
a>i, p(a>i)- If the quantized variable is stationary, these probabilities are fixed, and a 
fixed mapping such as Huffman coding can be used. If the probabilities evolve over 
time, more sophisticated adaptive methods such as adaptive arithmetic coding can 
be used. Such mappings will transform fixed-length codewords into variable-length 
ones, creating a variable-length bit stream. If a constant bit rate channel is used, 
buffering has to smooth out variations so as to accommodate the fixed-rate channel. 

Huffman Coding Given an alphabet {aj} of size M and its associated probabil- 
ities of occurrence p(a,i), the goal is to find a mapping bi = F(ai) such that the 
average length I (pi) is minimized: 

M-l 

E(l(k)) = X) PWfa)- ( 7 - L25 ) 

i=0 

We also require that a sequence of bi's should be uniquely decodable (note that 
invertibility of F is not sufficient). This last requirement puts an extra constraint 
on the codewords bi, namely, no codeword is allowed to be a prefix to another 
one. Then, the stream of fej's can be uniquely decoded by sequentially removing 
codewords bi. The lower bound of the expected length (7.1.25) is given by the 
entropy of the set {aj} 

M-l 
H a = ~X^) lo S2(p(«i))- (7-1.26) 

i=0 

Huffman's construction elegantly meets the prefix condition while coming quite 
close to the entropy lower bound. The design is guided by the following property 
of optimum binary prefix codes: The two least probable symbols have codewords 
of equal length which differ only in the last symbol. 

The design of the Huffman code is best looked at as growing a binary tree 
from the leaves up to the root. The codeword will be the sequence of zeros and 
ones encountered as going from the root to the leaf corresponding to the desired 
symbol. Start with a list of the probabilities of the symbols. Then, take the two 
least probable symbols and make them two nodes with branches (labeled "0" and 
"I") to a common node which represents a new symbol. The new symbol has a 
probability which is the sum of the two probabilities of the merged symbols. The 
new list of symbols is now shorter by one. Iterate until only one symbol is left. The 
codewords can now be read off along the branches of the binary tree. Note that at 
every step, we have used the property of optimum binary prefix codes so that the 
two least probable symbols were of equal length and had the same prefix. 
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Table 7.1 Symbols, probabilities and resulting possible Huffman codewords 
where H a = 2.28 bits and E[l{bi)} = 2.35 bits. First, the symbols are merged 
going from (a) to (e). Then, the codewords are assigned going from (e) to 

(a). 
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Figure 7.7 Huffman code derived from a binary tree and corresponding to the 
symbol probabilities given in Table 7.1. 



Example 7.1 Huffman Coding 

An example is given in Figure 7.7 where a Huffman tree is shown for the symbol probabilities 
given in Table 7.1(a). Let us first consider only the first two columns of each of the tables. 
We start from left to right and in Table 7.1(a) choose the two symbols with the lowest 
probabilities, that is, 4 and 5, and merge them. We then reorder the symbols in the 
decreasing order, and form Table 7.1(b). Now the process is repeated, joining symbols 
3 and (4 + 5). After a couple more steps, we obtain the final Table 7.1(e). Now we start 
assigning codewords, going from right to left. Thus, 0.6 gets a "1", and 0.4 gets a "0". 
Then we split 0.6, and assign "10" to 0.35, and "11" to 0.25. The final result of the whole 
procedure is given in Table 7.1(a) and Figure 7.7. 
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Note that we call Huffman coding optimal when the average length E(l(bi)) given 
in (7.1.25) reaches the theoretical lower bound given by the entropy (7.1.26), which 
is possible only if the symbol probabilities are powers of two. This is a limitation 
of Huffman coding, which can be surmounted by using arithmetic coding. It is 
more complicated to implement and, in its simplest form, it also requires a priori 
knowledge of symbol probabilities. If the source matches the probabilities used to 
design the arithmetic coder, then the rate approaches the entropy arbitrarily closely 
for long sequences. See [24] and [109] for more details. 



Adaptive Entropy Coding While the above approaches come close to the entropy 
of a known stationary source, they fail if the source is not well-known or changes 
significantly over time. A possible solution is to estimate the probabilities on the 
fly (by counting occurrences of the symbols at both the encoder and decoder) and 
modify the Huffman code accordingly. While this seems complicated at first sight, it 
turns out that only minor modifications are necessary, since only a single probability 
is affected by an entering symbol [105, 109]. 

Arithmetic coding can be modified as well, in order to estimate probabilities 
on the fly. This adaptive version is known as a Q-coder [221]. Finally, Ziv-Lempel 
coding [342] is an elegant lossless coding technique which uses no a priori proba- 
bilities. It builds up a dictionary of encountered subsequences in such a way that 
the decoder can build the same dictionary. Then, the encoder sends only the index 
to an encountered entry. The dictionary size is fixed and the index uses a fixed 
number of bits. Thus, the Ziv-Lempel coding maps variable-size input sequences 
into fixed-size codewords, a dual of the Huffman code. The only limitation of the 
Ziv-Lempel code is its fixed-size dictionary, which leads to loss in performance when 
very long sequences are encoded. No new entries can be created once the dictionary 
is full and the remainder of the sequence has to be coded with the current entries. 
Modifications of the basic algorithm allow for dictionary updates. Note that since 
there are many variations on this theme, we refer to [24] for a thorough discussion. 



Run-Length Coding Another important lossless coding technique is run-length 
coding [138] . It is useful when a sequence of samples consists of stretches of zeros 
followed by small packs of nonzero samples (this is typically encountered in subband 
image coding at the outputs of the highpass channels after uniform quantization 
with a dead zone, as in Section 7.3.3). It is thus advantageous to encode the length 
of the stretch of zeros, to then encode the values of the nonzero samples and then 
an indicator of the start of another run of zeros. Of course, both the length of runs 
and the nonzero values can be entropy coded. 
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7.1.4 Discussion 

So far we have separately considered the three building blocks of a transform coder 
as depicted in Figure 7.1. Some interaction between the transform and the quan- 
tization was discussed when proving the optimality of the KLT. Including entropy 
coding after quantization can change the way quantization should be done. In 
the high-rate, memoryless 4 case, uniform quantization followed by entropy coding 
turns out to be better than using nonuniform quantization and fixed codewords 
[109]. However, this leads to variable-rate schemes and thus requires buffering 
when fixed-rate channels are used. This is done with a finite-size buffer, which has 
a nonzero probability of overflow. Therefore, a buffer control algorithm is needed. 
This usually means moving to coarser quantization when the buffer is close to over- 
flow and finer quantization in the underflow case. Obviously, in the overflow control 
case, there is a loss in performance in such variable-rate schemes. The size of the 
buffer is limited for cost reasons, but also because of the delay it produces in a 
real-time transmission case. 

Our discussion has focused on MSE-based coding, but we indicated that it 
extends readily to weighted MSE. Such weights are usually based on perceptual 
criterions [141, 142], and will be discussed later. We note that certain "tricks" such 
as the dead zone quantizers used in image compression (uniform quantizers with a 
zone around zero larger than the step size that maps to the origin) are heuristics 
derived from experiments that are not optimal in the sense discussed so far, but 
which produce visually more pleasing images. 

7.2 Speech and Audio Compression 

In this section, we consider the use of signal expansions for one- dimensional signal 
compression. Subband methods are successful for medium compression of speech 
[68, 94, 103, 192], and high quality compression of audio [34, 77, 147, 267, 279, 
290, 333]. At other rates (for example, low bit rate speech compression) different 
methods are used, which we will briefly indicate as well. 

7.2.1 Speech Compression 

Production-Model Based Compression of Speech A particularity of speech 
is that a good production model can be identified. The vocal cords produce an 
excitation function which can be roughly classified into voiced (pulse-train like) 
and unvoiced (noise-like) excitation. The vocal tract, mouth, and lips act as a filter 
on this excitation signal. Therefore, very high compression systems for speech are 



4 Memoryless means that the output value at a present time depends only on the present input 
value and not on any past or future values. 
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based on identifying the parameters of this speech production model. Typically, 
linear prediction is used to identify a linear filter of a certain order which will 
whiten the speech signal (this is therefore the inverse filter of the speech production 
model). Then, the residual signal is analyzed to decide if the speech was voiced or 
unvoiced, and in the former case, to identify the pitch. Such an analysis is done on 
a segment-by-segment basis. It reduces the original speech signal to a small set of 
parameters: voiced/unvoiced decision plus pitch value in the voiced case and filter 
coefficients (up to 16 typically). At the decoder, the speech is synthesized following 
the production model and using the parameters identified at the encoder. As to be 
expected, this approach leads to very high compression factors. Speech sampled at 
8 kHz with 8 bits/sample, that is, at 64 Kbits/sec, is compressed down to as low 
as 2.4 Kbits/sec with adequate intelligibility but some lack of naturalness [141]. At 
8 to 16 Kbits/sec, sophisticated versions of linear predictive coders achieve what is 
called "toll quality," that is, they can be used on public telephone networks. Instead 
of simple voiced /unvoiced excitation, these higher-quality coders use a codebook 
from which the best excitation function is chosen. An important advantage of linear 
predictive coding (LPC) of speech is that low delay is achievable. 

High-Quality Speech Compression Certain applications require speech com- 
pression with better than telephone quality (for example, audio conferencing). This 
is often called wideband speech [141] since the sampling rate is raised from 8 kHz 
to 14 kHz. Because of the desire for high quality, more attention is focused on the 
perception process, since the goal is to attain a perceptually transparent coding. 
That is, masking patterns of the auditory system are taken advantage of, so as to 
place quantization noise in the least sensitive regions of the spectrum. In that sense, 
wideband speech coding is similar to audio coding, and we defer the discussion of 
masking to the next section. One difference, however, is the delay constraint which 
is stringent for real-time interactive speech compression, while being relaxed in the 
audio compression case, since the latter is usually performed off line. 

7.2.2 High-Quality Audio Compression 

Perceptual Models The auditory system is often modeled as a filter bank in a 
first approximation. This filter bank is based on critical bands [254], as shown in 
Figure 7.8 and Table 7.2. The key features of such a spectral view of hearing are 

[146]: 

(a) A constant relative bandwidth behavior of the filter (see Figure7.8). 

(b) Masking properties of dominant sounds over weaker ones within a critical band 
and over nearby bands, as given by a spreading function. 
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Figure 7.8 Critical bands of the auditory system. Bandpass niters' magnitude 
response on a logarithmic frequency axis. 
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Figure 7.9 Generic perceptual coder for high-quality audio compression (after [146]). 

The critical bands can be seen as pieces of the spectrum that are considered as an 
entity in the auditory process. For example, a sine wave centered in a given critical 
band will mask noise in this band, but not outside. While the masking properties 
are very complex and only partly understood, the basic concepts can be successfully 
used in an audio compression system. 

Unlike in the case of speech compression, there is no source model for general 
audio signals. However, there is a good perceptual model of the auditory process, 
which can be used for achieving better compression through perceptual coding [141]. 



Perceptual Coders A perceptual coder for transparent coding of audio will at- 
tempt to keep quantization noise just below the level where it would become no- 
ticeable. Quantization noise within a critical band has to be controlled and an easy 
way to do that is to use a subband or transform coder. Also, permissible quanti- 
zation noise levels have to be calculated and this is based on some form of spectral 
analysis of the input. Therefore, a generic perceptual coder for audio is as depicted 
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Table 7.2 Critical bands of the auditory 
system, which are of constant bandwidth 
at low frequencies (below 500 Hz) and 
of constant relative bandwidth at high 
frequencies [146]. 
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in Figure 7.9. Note that one can use the analysis filter bank as a spectrum analyzer 
or calculate a separate spectrum estimation. Usually, the two are integrated for 
computational reasons. 

A filter bank implementing critical bands exactly, is computationally unfeasible. 
Instead, some approximation is attempted that has roughly a logarithmic behav- 
ior, with an initial octave-band filter bank, but uses short-time Fourier-like banks 
within the octaves to get finer analysis at reasonable computational cost. A pos- 
sible example is shown in Figure 7.10, where LOT stands for lapped orthogonal 
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Figure 7.10 Filter bank example for the analysis part in a perceptual coder 
for audio, (a) Architecture, (b) Frequency resolution. 



transforms and also refers to cosine-modulated filter banks 5 (Section 3.4.3). Re- 
cently, Princen has proposed to use nonuniform modulated filter banks [227]. They 
are near perfect reconstruction and since they are a straightforward extension of 
the cosine-modulated filter banks, they are computationally efficient. High-quality 
audio coding usually does not have to meet delay constraints and thus the delay 
due to the filter bank is not a problem. Typically, very long filters are used in order 
to get excellent band discrimination, and to avoid aliasing as much as possible since 
aliasing is perceptually very disturbing in audio. 

The next step consists of estimating the masking thresholds within the bands. 
Typically, a fast Fourier transform is performed in parallel with the filter bank. 
Based on the signal energy and spectral flatness within a critical band, the max- 
imum tolerable quantization noise level can be estimated. Typically, single tones 
can be identified, their associated masking function derived, and thus, the allow- 
able quantization steps follow. Bands which have amplitudes below this maximum 
step can be disregarded altogether. For a detailed description of the perceptual 
threshold calculations, refer to [145]. Note that this quantization procedure is quite 



5 Note that this filter bank is known under many names, such as LOT, MLT, MDCT, TDAC, 
Princen & Bradley filter bank, cosine modulated filter bank [188, 229, 228]. 
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Frequency response of 32 subbands 




Figure 7.1 1 Magnitude response of the 32-channel filter bank used in MUSI- 
CAM. The prototype is a length 512 window, and cosine modulation is used 
to get the 32 modulated filters. 



different from an MSE-based approach as discussed in Section 7.1.2, where only the 
variances within bands mattered. Sometimes, the perceptual and MSE approaches 
are combined. A first pass allocates an initial number of bits so as to satisfy the 
minimum perceptual requirements, while a second pass distributes remaining bits 
according to the usual MSE criterions. 

The quantization and bit allocation is recalculated for every new segment of the 
input signal, and sent as side information to the decoder. Because entropy coding 
is used on the quantized subband samples, the bit stream has to be buffered if fixed 
rate transmission is intended. Note that not all systems use entropy coding (for 
example, MUSICAM does not). 

7.2.3 Examples 

Various applications such as digital audio broadcasting (DAB) require CD-quality 
audio (44.1 kHz sampling and 16 bits/sample). This lead to the development of 
medium compression, high-quality standards for audio coding. 



MUSICAM Probably the most well-known audio coding algorithm is MUSICAM 
(Masking-pattern Universal Subband Integrated Coding and Multiplexing) [77, 
279], used in the MPEG-I standard, and thus frequently referred to as MPEG 
audio [38]. It is also conceptually the simplest coder. This system uses a 32-band 
uniform filter bank, obtained by modulation of a 512-tap prototype lowpass filter. 
The magnitude response of this filter bank is shown in Figure 7.11. One reason for 
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Figure 7.12 Example of quantization based on psychoacoustics. (a) Line 
spectrum and associated masking function, (b) Quantization noise in the 32 
subbands of MUSICAM taking advantage of masking. 



choosing such a filter bank is that it has a reasonable computational complexity 
since it can be implemented with a polyphase filter followed by a fast transform (see 
Section 6.2). Another reason is its smaller delay when compared to a tree-structured 
filter bank. 

In parallel to the filter bank, a fast Fourier transform is used for spectral esti- 
mation. Based on the power spectrum, a masking curve is calculated, an example 
of which is shown in Figure 7.12. Quantization noise is then allocated in the var- 
ious subbands according to the masking function. This allocation is done on a 
small block of subband samples (typically 12). The maximum value within a block, 
called scale factor, and the quantization step, based on masking, are calculated for 
each block. They are transmitted as side information, together with the quantized 
samples. MUSICAM does not use entropy coding, the quantized values are sent 
(almost) directly. 

The resulting system compresses audio signals of about 700 Kbits/sec (44.1 
kHz, 16 bit samples) down to around 128 Kbits/sec, without audible impairments 
[77, 279]. When used on stereo signals, it leads to a bit rate of 256 Kbits/sec. 

PAC Coder An interesting coder for high-quality compression of audio is the PAC 
(Perceptual Audio Coder) coder [147]. In its stereo version, it has been proposed 
for digital audio broadcasting as well as for a nonbackward compatible MPEG-II 
audio compression system. 

The coder has the basic blocks that are typical of many perceptual coders, 
given in Figure 7.9. The signal goes through a filter bank and a perceptual model. 
Then the outputs of the filter bank and the perceptual model are fed into PCM 
quantization, Huffman coding and rate control. 

The filter bank is based on the cosine modulated banks presented in Sec- 
tion 3.4.3, with window switching. The psychoacoustic analysis provides a noise 
threshold for L (Left), R (Right), S (Sum) and D (Difference) channels, where 
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S = L + R and D = L — R. One feature of the PAC algorithm is that it is adaptive 
in time and frequency since, in each frequency band, it sends either the (L, R) or 
(5, D) signals, depending on which one is more efficient. 

This coder provides transparent or near-transparent quality coding at 
192 Kbits/sec/stereo pair, and high-quality coding at 128 Kbits/sec/stereo pair. 

AC System Two well-known algorithms for high-quality audio compression are 
the AC-2 and AC-3 algorithms, coming from Dolby [34, 290]. They have both stereo 
and five-channel, surround system, versions. 

The AC-2 version exploits both the time-domain and frequency-domain psy- 
choacoustic models. It uses a time-frequency division scheme, achieving a trade- 
off between time and frequency resolutions, on a signal-dependent basis. This is 
achieved by selecting the optimal transform block length for each 10ms analysis in- 
terval. The filter bank is based again on the cosine-modulated filter bank [229, 228]. 
This coder operates at a variety of bit rates ranging from 64-192 Kbits/sec/channel. 
The 128 Kbits/sec/ channel AC-2 version has been selected for use in a new multi- 
channel NTSC compression system [34]. 

As can be seen from the above three examples, filter bank methods had a 
substantial impact on audio compression systems. Note that sophisticated time- 
frequency analysis is a key component. 

7.3 Image Compression 

Multiresolution techniques are most naturally applied to images, where notions 
such as resolution and scale are very intuitive. Multiresolution techniques have 
been used in computer vision for tasks such as object recognition and motion es- 
timation as well as in image compression, with pyramid [41] and subband coding 
[111, 314, 337]. An important feature of such image compression techniques is 
their successive approximation property: As higher frequencies are added (which 
is equivalent to more bands in subband coding or, difference signals in pyramids), 
higher-resolution images are obtained. Note that multiresolution successive approx- 
imation corresponds to the human visual system which helps the multiresolution 
techniques in terms of perceptual quality. Transform coding also has a successive 
approximation property (see the discussion on the Karhunen-Loeve transform in 
Section 7.1.1) and is thus part of this broad class of techniques which are char- 
acterized by multiresolution approximations. In short, besides good compression 
capabilities, these schemes allow partial decoding of the coded version which lead 
to usable subresolution approximations. 

We start by discussing the standard image compression schemes, which are based 
on block transforms such as the discrete cosine transform (DCT) or overlapping 
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block transforms such as the lapped orthogonal transform. This leads naturally to 
a description of the current image compression standard based on the DCT, called 
JPEG [148, 327], indicating some of the constraints of a "real- world" compression 
system. 

We continue by discussing pyramid coding, which is a very simple but flexi- 
ble image coding method. A detailed treatment of subband /wavelet image coding 
follows. Several important issues pertaining to the choice of the filters, the decom- 
position structure, quantization and compression are discussed and some examples 
are given. 

Following these standard coding algorithms, we describe some more recent and 
sometimes exploratory compression schemes which use multiresolution as an in- 
gredient. These include image compression methods based on wavelet maximums 
[184], and a method using adaptive wavelet packets [15, 233]. We also discuss 
some recent work on a successive approximation method for image coding using 
subband/wavelet trees [259], quantization error analysis in a subband system [331], 
joint design of quantization and filtering for subband coding [161], and nonorthog- 
onal subband coding [200]. 

Note that in all experiments, we use the standard image Barbara, with 512 x 512 
pixels and 8-bit gray-scale values (see Figure 7.13). For comparison purposes, we 
will use the peak signal-to-noise ratio (SNR p ) given by (7.1.3). 

7.3.1 Transform and Lapped Transform Coding of Images 

We have introduced block transforms in Section 3.4.1, and while they are a par- 
ticular case of filter banks (with filter length L equal to the downsampling factor 
N), they are usually considered separately. Their importance in practical image 
coding applications is such that a detailed treatment is justified. As we mentioned 
in audio coding examples, lapped orthogonal transforms are also filter bank expan- 
sions since they use modulated filter banks with filters of length typically twice 
the downsampling factor, or L = 2N. They have been introduced as an extension 
of block transforms in order to solve the problem of blocking in transform coding. 
Because of this close relationship between block transforms and lapped transforms, 
quantization and entropy coding for both schemes are usually very similar. A text 
on transform coding of images is [54], and lapped transform coding is treated in 
[188]. 

Block Transforms Recall that unitary block transforms of size N x N are defined 
by N orthonormal basis vectors, that is, the transform matrix T has these basis 
vectors as its rows (see Section 3.4.1 and (7.1.4)). For two-dimensional signals, one 
usually takes a separable transform which corresponds to the Kronecker product of 
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Figure 7.13 Standard image used for the image compression experiments, 
called Barbara. The size is 512 x 512 pixels and 8 bits/pixel. 



T with itself, 
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In other words, this separable transform can be evaluated by taking one- dimensional 
transforms along the rows and columns of a block B of an image. This can be 
written as: 



Br. 



TBT 



T 



where the first product corresponds to transforming the columns, while the second 
product computes the transform on rows of the image block. Many transforms have 
been proposed for the coding of images. Besides the DCT given in (7.1.10-7.1.11), 
the sine, slant, Hadamard and Haar transform are common candidates, the last 
two mainly because of their low computational complexity (only additions and sub- 
tractions are involved). All of the transforms have fast, 0(N log N) algorithms, as 
opposed to the optimal KLT which has 0(N 2 ) complexity and is signal dependent. 
The performance of the DCT in image compression is sufficiently close to that of 
the KLT as well as superior to other transforms so that it has become the standard 
transform. Figure 7.14 shows the 8x8 DCT transform of the original image. Note 
the two representations shown. In part (a), we display the transform of each block 
of the image, while part (b) has gathered all coefficients of the same frequency into 
a block. This latter representation is simply a subband interpretation of the DCT; 
for example, the lowest left corner is the output of a filter which takes the average 
of 8 x 8 blocks. The similarity of this representation with subband-decomposed 
images is obvious. Note that for quantization and entropy coding purposes, the 
representation (a) is preferred. 
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Figure 7.14 8x8 DCT transform of the original image. On the left is the 
usual block-by-block representation and on the right is the reordering of the 
coefficients so that same frequencies appear together (subband interpretation 
of DCT). The lowest frequency is in the lower left corner. 



The quantization in the DCT domain is usually scalar and uniform. The lowest 
two-dimensional frequency component, called the DC coefficient, is treated with 
particular care. According to (7.1.10), it corresponds to the local average of the 
block. Mismatches between blocks often lead to the feared blocking effect, that 
is, the boundaries between the blocks become visible, a visually annoying artifact. 
Because the DC coefficient has the highest energy, a fine scalar quantization leads 
to a large entropy. Also, as can be seen in Figure 7.14(b), there is still high correla- 
tion among DC coefficients (it resembles the original image). Therefore, predictive 
quantization, such as the DPCM, of the DC coefficients is often used to increase 
compression without increasing distortion. 

The choice of the quantization steps for the various coefficients of the DCT is 
a classic bit-allocation problem, since distortion and rate are additive. However, 
perceptual factors are very important and careful experiments lead to quantization 
matrices which take into account the visibility of errors (besides the variance and en- 
tropy of the coefficients). While this has the flavor of a weighted MSE bit-allocation 
method, it relies heavily on experimental results. An example quantization matrix, 
showing the quantizer step sizes used for various DCT coefficients in JPEG, is given 
in Table 7.3 [148]. What is particularly important is the relative size of the steps, 
because within a certain range one can scale this quantization matrix, that is, mul- 
tiply all step sizes by a scale factor greater or smaller than one in order to reduce 
or increase the bit rate, respectively. This scale factor is very useful for adaptive 
quantization, where the bit allocation is made between blocks which have various 
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Table 7.3 Example of a quantization matrix 
as used in DCT transform coding in JPEG 
[148]. The entries are the step sizes for the 
quantization of the coefficient (i,j). Note 
that the relative step sizes are what is crit- 
ical, since the whole matrix can be multi- 
plied by an overall scale factor. The lowest 
frequency or DC coefficient is in the upper 
left corner. 
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energy levels. Then, one can think of this scale factor as a "super" quantizer step 
and the goal is to choose the sequence of scale factors that will minimize the total 
distortion given a certain budget. Each block has its rate-distortion function and 
thus, the scale factors can be chosen according to the constant-slope rule described 
in Section 7.1.2. Sometimes, scale factors are fixed for a number of blocks (called 
macro-block) in order to reduce the overhead. 

Of course, bit allocation is done by taking entropy coding into account, which 
we describe next. As in subband coding, higher frequency coefficients have lower 
energy and thus have high probability to be zero after quantization. In particular, 
the conditional probability of a high-frequency coefficient to be zero, given that its 
predecessors are zero, is close to one. Therefore, there will be runs of zeros, in par- 
ticular up to the terminal coefficient. To take better advantage of this phenomenon 
in a two-dimensional transform, an ordering of the coefficients called zig-zag scan- 
ning is used (see Figure 7.15(a)). Very often, a long stretch of zeros terminates 
the sequence (see Figure 7.15(b)) and then an "end of block" (EOB) can be sent 
instead. The nonzero values and the run lengths are entropy coded (typically using 
Huffman or arithmetic codes). 

Note that DCT coding is used not only on images, but also in video cod- 
ing. While the same principles are used, specific quantization and entropy coding 
schemes have to be developed, as will be seen in Section 7.4.2. 

The coding of color images is performed on a component-by-component basis, 
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Figure 7.15 Zig-zag scanning of 8 x 8 DCT coefficients, (a) Ordering of 
the coefficients. DC stands for the average or constant component, while AC 
stands for the higher frequencies, (b) Typical sequence of quantized and zig-zag 
scanned DCT coefficients. 



that is, after transformation into an appropriate color space such as the luminance 
and two chrominance components. The components are coded individually with a 
lesser weighting of the errors in the chrominance components. 



Overlapping Block Transforms Lapped orthogonal transforms (see also Sec- 
tion 3.4.1) were developed specifically to solve the blocking problem inherent to 
block transforms. Rather than having a hard transition from one block to the next, 
they smooth out the boundary with an overlapping window [44, 188, 189]. 

For image coding applications, the LOT basis functions are designed so as to 
resemble the DCT basis functions and thus, the behavior of lapped orthogonal 
transform coefficients is very similar to that of DCT coefficients. That is, the 
DCT quantization and entropy coding strategies will work well in LOT encoding 
of images as well. 

While it is true that blocking effects are reduced in LOT compressed images, 
other artifacts tend to appear, such as increased ringing around edges due to longer 
basis functions. Because the blocking effect with the LOT is reduced, one can use 
more channels, that is, larger blocks, (16 x 16), and achieve better compression. 

The LOT represents an elegant extension of the DCT, however, it has not yet 
been successful in dislodging it. One of the reasons is that the improvements are 
not sufficient to justify the increase in complexity. While the LOT has a fast, 
0(N log N) algorithm, the structure is more involved since blocks now interact 
with neighbors. While this small increase in complexity is not much of a problem 
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in software, it has made LOT's less attractive in VLSI implementations so far. 

Example: JPEG Image Coding Standard To describe a transform coding ex- 
ample, we will discuss the JPEG industry standard [148, 327]. While it is not the 
most sophisticated transform coder, its simplicity and good performance (for the 
type of imagery and bit rate it has been designed for) made it very popular. The 
availability of special purpose hardware implementing JPEG at high rates (such as 
30 frames per second) has further imposed this standard both in still image and in 
intraframe video compression (see the next section). 

An important point is that the JPEG image compression standard specifies only 
the decoder, thus allowing for possible improvements of the encoder. The JPEG 
standard comprises several options or modes of operation [327]: 

(a) Sequential encoding: block-by-block encoding in scan order. 

(b) Progressive encoding: geared at progressive transmission, or successive ap- 
proximation. To achieve higher-resolution pictures, it uses either more and 
more DCT coefficients, or more and more bits/coefficient. 

(c) Hierarchical encoding: a lower-resolution image is encoded first, upsampled 
and interpolated to predict the full resolution and the difference or prediction 
error is encoded with one of the other JPEG versions. This is really a pyrami- 
dal coder as will be seen in Section 7.3.2 which uses JPEG on the difference 
signal. 

(d) Lossless encoding: this mode actually does not use the DCT, but predictive 
encoding based on a causal neighborhood of three samples. 

We will only discuss the sequential encoding mode in its simplest version which 
is called the baseline JPEG coder. It uses a size 8x8 DCT, which was found to 
be a good compromise between coding efficiency (large blocks) and avoidance of 
blocking effects (small blocks). This holds true for the typical imagery and bit 
rates for which JPEG is designed, such as the 512 x 512 Barbara image compressed 
to 0.5 bits/pixel. Note that other types of imagery might use other DCT sizes. 

The input is assumed to be 8 bits (typical for regular images) or 12 bits (typical 
for medical images). Colors are separately treated. After the DCT transform, the 
quantization uses a carefully designed set of uniform quantizers. Their step sizes 
are stored in a quantization table, where each entry is an integer belonging to the 
set {1, . . . ,255}. An example was shown in Table 7.3. Quantization is performed 
by rounding the DCT coefficient divided by the step size to the nearest integer. At 
the decoder, this rounded value is simply multiplied by the step size. Note that the 
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quantization tables are based on visual experiments, but since they can be specified 
by the user, they are not part of the standard. 

Zig-zag scanning follows quantization and finally entropy coding is performed. 
First, the DC coefficient (the average of 64 samples) is differentially encoded, that 
is, A; = DCi — DCi-i is entropy coded. This removes some of the correlation 
left between DC coefficients of adjacent blocks. Then, the sequences of remaining 
DCT coefficients is entropy coded. Because of the high probability of stretches 
of consecutive zeros, run-length coding is used. A symbol pair (L, A) specifies the 
length of the run (0 to 15) and the amplitude range (number of bits, 0, . . . , 10) of the 
following nonzero value. Then follows the nonzero value (which has the previously 
specified number of bits). For example, (15,7) would mean that we have 15 zeros 
followed by a number requiring seven bits. 

Runs longer than 15 samples simply use a value A equal to zero, signifying con- 
tinuation of the run, and the pair (0, 0) stands for end of block (no more nonzero 
values in this block). Finally, the pairs (L, A) are Huffman coded with a table spec- 
ified by the user (default tables are suggested, but can be replaced). The nonzero 
values following a run of zeros are now so-called variable-length integers specified 
by the preceding value A. These are not Huffman coded because of insufficient gain 
in view of the complexity. 

The decoder now operates as follows: Based on the Huffman coding table, 
it entropy decodes the incoming bit stream, and using the quantization table, it 
"dequantizes" the transform domain values. Finally, an inverse DCT is applied to 
reconstruct the image. 

Figure 7.16 schematically shows a JPEG encoder. An example of the Barbara 
image coded with the baseline JPEG algorithm is shown in Figure 7.17 at the rate 
of 0.5 bits/pixel and SNR P = 28.26 dB. 

7.3.2 Pyramid Coding of Images 

A simple, yet powerful image representation scheme for image compression is the 
pyramid scheme of Burt and Adelson [41] (see Section 3.5.2). From an original 
image, derive a coarse approximation, for example, by lowpass filtering and down- 
sampling. Based on this coarse version, predict the original (by upsampling and 
filtering) and calculate the difference as the prediction error. Instead of the original 
image, one can compress the coarse version and the prediction error. If the predic- 
tion is good (which will be the case for most natural images which have a lowpass 
characteristic), the error will have a small variance and can thus be well compressed. 
Of course, the process can be iterated on the coarse version. Figure 7.18 shows such 
a pyramid scheme. Note how perfect reconstruction, in absence of quantization of 
the difference signal, is simply obtained by adding back at the decoder the predic- 
tion which was subtracted at the encoder. 
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Figure 7.16 Transform coding following the JPEG standard. The encoder is 
shown. The decoder performs entropy decoding, inverse quantization and an 
inverse DCT (after [327]). 




Figure 7.17 Example of a transform-coded Barbara using the JPEG standard. 
The image has 512 x 512 pixels, the target rate is 0.5 bits/pixel and SNR p = 
28.26 dB. 



Quantization Noise Refer to Figure 7.18. Because the prediction x p is based on 
the quantized coarse version x c (rather than x c itself), the only source of quanti- 
zation error in the reconstructed signal is the one due to the quantizer Q d . Since 
x d = x d + &d where e d is the error due to the quantizer Q d , we find that 



x 



x d + x p = x d + e d + x f 



x + e d , 



where we used the fact that x = x d + x p in a pyramid coder. This is important 
if one is interested in the maximum error introduced by coding. In the pyramid 
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Figure 7.1 8 One-step pyramid coding. Both encoding and decoding are shown. 
Note that only the quantization of the difference signal contributes to the 
reconstruction error. D stands for deriving a coarse version, and / stands for 
interpolation. 



case, it will simply be the maximum error of the quantizer Qd (typically half the 
largest quantization interval). The property holds also for multilevel pyramids if 
one uses quantization error feedback [303]. As can be seen from Figure 7.19, the 
trick is to use only quantized coarse versions in the prediction of a finer version. 
Thus, the same prediction can be obtained in the decoder as well and the source 
of quantization noise can be limited to the last quantizer Q^ . Note that quantizer 
error feedback requires the reconstruction of x Cl in the encoder, and is thus more 
complex than an encoder without feedback and adds encoding delay. 



Decimation and Interpolation Operators In Figures 7.18 and 7.19, we used 
boxes labeled D and I to denote operators that derive the coarse version and inter- 
polate the fine version, respectively. While these operators are often linear filters, 
as in the original Burt and Adelson scheme [41] , nothing prohibits the use of non- 
linear operators [9]. While such generalized operators have not been often used so 
far, they represent a real potential for pyramid coding. For example, sophisticated 
methods based on edges could be used to get very rough coarse versions, as long as 
the prediction reduces the variance of the difference signal sufficiently. 

Another attractive feature of this freedom in choosing the operators is that 
visually pleasing coarse versions are easy to obtain. This is because the filters used 
for decimation and interpolation, unlike in the subband case, are unconstrained. 
Typically, zero-phase FIR filters are used where medium lengths already achieve 
good lowpass behavior and visually good looking coarse versions. 
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Figure 7.19 Quantization noise feedback in a two-step pyramid. Only the 
encoder is shown. Note that a decoder is part of the encoder in order to make 
predictions based on quantized versions only. 



Oversampling A drawback of pyramid coding is the implicit oversampling. As- 
sume we start with an N x N image. After one step, we have an N/2 x N/2 coarse 
version, but also an N x N difference image. If the scheme is iterated we have the 
following number of samples: 



iv 2 (i + i + i + 



< 



-N 2 



as was given in (3.5.4). This oversampling of up to 33% has often been considered 
as a drawback of pyramid coding (in one dimension, the overhead is 100% and thus 
a real problem). However, it does not prohibit efficient coding a priori and the 
other attractive features such as the control of quantization noise, quality of coarse 
pictures, and robustness counterbalance the oversampling problem. 



Bit Allocation The problem of allocating bits to the various quantizers is tricky in 
pyramid coders, especially when quantization noise feedback is present. The reason 
is that the independence assumption used in the optimal bit allocation algorithm 
derived in Section 7.1.2 does not hold. Consider Figure 7.18 and assume a choice 
of quantizers for Q c and Qd- Because the choice for Q c influences the prediction 
x p and thus the variable to be quantized Xd, there is no independence between the 
choices for Q c and Qj. For example, increasing the step size of Q c not only increases 
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the distortion of x c , but also of sbd (since its variance will probably increase). Thus, 
in the worst case, one might have to search all possible pairs of quantizers for x c 
and Xd and find the best performing pair given a certain bit budget. It is clear that 
this search grows exponentially as the number of levels increases, since we have K l 
possible /-tuples of quantizers, where K is the number of quantizers at every level 
and I is the number of levels. Even if quantization error feedback is not used, there 
is a complication because the total error squared is not the sum of the errors e c and 
e^ squared (see (7.1.16)), since the pyramid decomposition is not unitary (unless 
an ideal lowpass filter is assumed). A discussion of dependent quantization and its 
application to pyramid coding can be found in [232]. 

7.3.3 Subband and Wavelet Coding of Images 

The generalization of subband decomposition to multiple dimensions is straight- 
forward, especially in the separable case [314]. The application to compression of 
images has become popular [1, 111, 265, 330, 332, 335, 337]. The nonseparable 
multidimensional case, using quincunx [314] or hexagonal downsampling [264], as 
well as directional decompositions [19, 287], has also found applications in image 
compression. Recently, using filters specifically designed for regularity, methods 
closely related to subband coding have been proposed under the name of wavelet 
coding [14, 79, 81, 101, 176, 244, 260, 341]. The main difference with pyramid 
coding, discussed in Section 7.3.2, is that we have a critically sampled scheme and 
often an orthogonal decomposition. The price paid is more constrained filters in 
the decomposition, which leads to poorer coarse resolution pictures in general. In 
what follows, we discuss various forms of subband and wavelet compression schemes 
tailored to images. 

Separable Decompositions We will call separable decompositions those which 
use separable downsampling. Usually, they also use separable filters (but this is not 
necessary). When both downsampling and filters are separable, the implementation 
is very efficient since it can be done on rows and columns separately, at least at 
each stage of the decomposition. 

While being constrained, separable systems are often favored because of their 
computational efficiency with separable filters, since size-iV x N filters lead to or- 
der N rather than N 2 operations/input sample (see Section 6.2.4). Conceptually, 
separable systems are also much easier to implement since they are cascades of 
one-dimensional systems. However, from the fact that the two-dimensional filters 
are products of one-dimensional filters, it is clear that only rectangular pieces of 
the spectrum can be isolated. 
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Figure 7.20 Sublattices of iJ G and shapes of possible ideal lowpass niters 
(corresponding to the Voronoi cell of the dual lattice, which is indicated as 
well), (a) Separable sublattice D$- (b) Quincunx Dq. (b) Hexagonal Dh- 



Nonseparable Decompositions Recall that coding gain in subband coding was 
maximized when the variances in the channels were as different as possible (see 
Section 7.1.2). If one assumes that images have a power spectrum that is roughly 
rotationally-invariant and decreases with higher frequencies, then it is clear that 
separable systems are not best suited for isolating a lowpass channel containing 
most energy and having highpass channels with low energy. A better solution is 
found by opting for nonseparable systems. The two most important systems for 
image processing are based on the quincunx [314] and hexagonal downsamplings 
[264], for two- and four-channel subband coding systems, respectively. Quincunx 
and hexagonal sublattices of Z € are shown in Figure 7.20, together with the more 
conventional separable sublattice. They correspond to integer linear combinations 
of the columns of the following matrices 6 : 



£>< 
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2 



D ( 



2 1 
1 



D 



H 



2 1 

2 



where the sampling density is reduced by a factor of four for the separable sampling, 
two for the quincunx sampling (see also Appendix 3.B) and by a factor of four 
for the hexagonal sampling. The repeated spectrums in Fourier domain due to 
downsampling appear on the dual lattice, which is given by the transposed inverse 
of the lattice matrix. Also shown in Figure 7.20 are possible ideal lowpass filters that 



6 Recall from Appendix 3.B, that a given sampling lattice may have infinitely many matrix 
representations. 
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Figure 7.21 Frequency decomposition of iterated quincunx scheme. 



will avoid aliasing when downsampling to these sublattices. If, as we said, images 
have circularly symmetric power spectrums that decrease with higher frequencies, 
then the quincunx lowpass filter will retain more of the original signal's energy than 
a separable lowpass filter (which would be one-dimensional since the downsampling 
is by two). Using the same argument, the hexagonal lowpass filter is then better 
than the corresponding lowpass filter in a separable system with downsampling by 
two in each dimension. Thus, these nonseparable systems, while being more difficult 
to design and more complex to implement, represent a better match to usual image 
spectrums. 

Furthermore, the simple quincunx case has the following perceptual advantage: 
The human visual system is more accurate in horizontal and vertical high frequen- 
cies than along diagonals. The lowpass filter in Figure 7.20(b) conserves horizontal 
and vertical frequencies, while it cuts off diagonals to half of their original range. 
This is a good match to the human eye and often, the highpass channel (which is 
complementary to the lowpass channel) can be disregarded altogether. That is, a 
compression by a factor of two can be achieved with no visible degradation. Such 
preprocessing has been used in intraframe coding of HDTV [12] . The above quin- 
cunx scheme is often iterated on the lowpass channel, leading to a frequency decom- 
position as shown in Figure 7.21. This actually corresponds to a two-dimensional 
nonseparable wavelet decomposition [163] and has been used for image compression 
[14]. 

The hexagonal system, besides having a fairly good approximation to a circu- 
larly symmetric lowpass, has three directional channels which can be used to detect 
directional edges [264]. However, the goal of an isotropic analysis is only approx- 
imated, since the horizontal and vertical directions are not treated in the same 
manner (see Figure 7.20(c)). Therefore, it is not clear if the added complexity of a 
nonseparable four-channel system based on the hexagonal sublattice is justified for 
coding purposes. 
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Choice of Filters Unlike in audio compression, the niters for image subband cod- 
ing do not need high out-of-band rejection. Instead, a number of other constraints 
have to be satisfied. 

Linear phase In regular image filtering, the need for linear phase is well-known since 
without linear phase, the phase distortion around edges is very visible. Therefore, 
the use of linear phase filters in subband coding has been often advocated [14]. 
Recall from Section 3.2.4, that in two-band FIR systems, linear phase and orthog- 
onality are mutually exclusive and this carries over to four-band separable systems 
which are most often used in practice. 

However, the case for linear phase is not as obvious as it seems at first sight. 
For example, in the absence of quantization, the phase of the filters has no bearing 
since the system has perfect reconstruction. This argument carries over for fine 
quantization as well. In the case of coarse quantization, the situation is more 
complex. One scenario is to consider the highpass channel as being set to zero. 
Look at the two impulse responses of this system. Nonlinear phase systems lead to 
nonsymmetric responses, but so do some of the linear phase systems. Only if the 
filters meet additional constraints do the two impulse responses remain symmetric. 
Note also, that for computational purposes, linear phase is more convenient because 
of the symmetry of the filters. 

Note that orthogonal FIR filters of sufficient length can be made almost linear 
phase by appropriate factorization of their autocorrelation function. Also, there 
are nonseparable orthogonal filters with linear phase. Finally, by resorting the IIR 
filters, one can have both linear phase and orthogonality, and such noncausal IIR 
filters can be used in image processing without problems since we are dealing with 
finite-length input signals. 

Orthogonality Orthogonal filters implement a unitary transform between the input 
and the subbands. The usual features of unitary transforms hold, such as con- 
servation of energy. In particular, the total distortion is the sum of the subband 
distortions, or: 



D = £)A, (7.3.1) 



and the total bit rate is the sum of all the subband's bit rates. Therefore, optimal 
bit-allocation algorithms which assume additivity of bit rate and distortion can be 
used (see Section 7.1.2). In the nonorthogonal case, (7.3.1) does not hold, and thus, 
these bit allocation algorithms cannot be used directly. It should be noted that well 
designed linear phase FIR filter banks (that is, with good out-of-band rejection) are 
often close to being orthogonal and thus satisfy (7.3.1) approximately. 
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Fitter size Good out-of-band rejection or high regularity require long filters. Be- 
sides their computational complexity, long filters are usually avoided because they 
tend to spread coding errors. For example, sharp edges introduce distortions be- 
cause high-frequency channels are coarsely quantized. If the filters are long (and 
usually their impulse response has several sign changes), this causes an annoying 
artifact known as ringing around edges. Therefore, filters used in audio subband 
compression, such as length-32 filters, are too long for image compression. Instead, 
shorter "smooth" filters are preferred. Sometimes both their impulse and their step 
response are considered from a perceptual point of view [167]. The step response 
is important since edges in images will generate step responses at least in some 
of the channels. Highly oscillating step responses will require more bits to code, 
and coarse quantization will produce oscillations which are related to the step re- 
sponse. As can already be seen from this short discussion, there is an intertwining 
between the choice of filters and the type of quantization that follows. However, 
it is clear that the frequency-domain criterions used in audio (sharp cut-off, strong 
out-of-band rejection) have little meaning in the image compression context, where 
time-domain arguments such as ringing, are more important. 

Regularity An orthogonal filter with a certain number of zeros at the aliasing fre- 
quency (tt in the two-channel case) is called regular if its iteration tends to a con- 
tinuous function (see Section 4.4). The importance of this property for coding is 
potentially twofold when the decomposition is iterated. First, the presence of many 
zeroes at the aliasing frequency can improve the coding gain and second, compres- 
sion artifacts might be less objectionable. To investigate the first effect, Rioul [243] 
compared the compression gain for filters of varying regularity used in a wavelet 
coder, or octave-band subband coder, with four stages. The experiment included 
bit allocation, quantization, and entropy coding and is thus quite realistic. The 
results are quite interesting: Some regularity is desired (the performance with no 
regularity is poor) and higher regularity improves compression further (but not 
substantially). 

As for the compression artifacts, the following argument shows that the filters 
should be regular when an octave-band decomposition is used: Assume a single 
quantization error in the lowpass channel. This will add an error to the recon- 
structed signal which depends only on the equivalent — iterated lowpass filter. If 
the iterated filter is smooth, this will be less noticeable than if it is a highly irregular 
function (even though both contribute the same MSE). Note also that the lowest 
band is upsampled 2* _1 times (where i is the number of iterations) and thus, the 
iterated filter's impulse response is shifted by large steps, making irregular patterns 
in the impulse response more visible. 

In the case of biorthogonal systems such as linear phase FIR filter banks, one is 
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often faced with the case where either the analysis or the synthesis is regular, but 
not both. In that case, it is preferable to use the regular filter at the synthesis, by 
the same argument as above. Visually, an irregular analysis is less noticeable than 
an irregular synthesis, as can be verified experimentally. 

When the decomposition is not iterated, regularity is of little concern. A typical 
example is the lapped orthogonal transform, that is, a multi-channel filter bank 
which is applied only once. 

Frequency selectivity What is probably the major criterion in audio subband filter 
design is of much less concern in image compression. Aliasing, which is a major 
problem in audio, is much less disturbing in images [331]. The desire for short filters 
limits the frequency selectivity as well. One advantage of frequency selectivity is 
that perceptual weighting of errors is easier, since errors will be confined to the 
band where they occur. 

In conclusion, subband image coding requires relatively short and smooth filters, 
with some regularity if the decomposition is iterated. 

Quantization Of the SubbandS There are basically two ways to approach quan- 
tization of a subband-decomposed image: Either the subbands are quantized inde- 
pendently of each other, or dependencies are taken into account. 

Independent quantization of the subbands While the subbands are only independent 
if the input is a Gaussian random variable and the filters decorrelate the bands, the 
independence assumption is often made because it makes the system much simpler. 
Different tree structures will produce subbands with different behaviors, but the 
following facts usually hold: 

(a) The lowest band, being a lowpass and downsampled version of the original, 
has a behavior much like the original image. That is, traditional quantization 
methods used for images can be applied here as well, such as DPCM [337] or 
even transform coding [174, 285]. 

(b) The highest bands have negligible energy and can usually be discarded with 
no noticeable loss in visual quality. 

(c) Except along edges, little correlation remains within higher bands. Because of 
the directional filtering, the edges are confined to certain directions in a given 
subband. Also, the probability density function of the pixel values peaks 
in zero and falls off very rapidly. While it is often modeled as a Laplacian 
distribution, it is actually falling off more rapidly. It is more adequately fitted 
with a generalized Gaussian pdf with faster decay than the Laplacian pdf [329]. 
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Besides the lowband compression, which uses known image coding methods, the 
bulk of the compression is obtained by appropriate quantization of the high bands. 
The following quantizers are typically used: 

(a) Lloyd quantizers fitted to the distribution of the particular band to be quan- 
tized. Tables of such Lloyd quantizers for generalized Gaussian pdf's and 
decay values of interest for image subbands can be found in [329] . 

(b) Uniform quantizers with a so-called dead zone which maps a region around the 
origin to zero (typically of twice the step size used elsewhere) . Such dead zone 
quantizers have proven useful because they increase compression substantially 
with little loss of visual quality, since they tend to eliminate what is essentially 
noise in the subbands [111]. 

Because entropy coding is used after quantization, uniform quantizers are nearly 
optimal [285]. Thus, since uniform quantizers are much easier to implement than 
Lloyd quantizers, the former are usually chosen, unless the variable rate associated 
with entropy codes has to be avoided. Note that vector quantization could be used 
in the subbands, but its complexity is usually not worthwhile since there is little 
dependence between pixels anyway. 

An important consideration is the relative perceptual importance of various 
subbands. This leads to a weighting of the MSE in various subbands. This weighting 
function can be derived through perceptual experiments by finding the level of "just 
noticeable noise" in various bands [252]. As expected, high bands tolerate more 
noise because the human visual system becomes less sensitive at high frequencies. 
Note that more sophisticated models would include masking as well. 

Quantization across the bands Looking at subband decomposed images, it is clear 
that the bands are not independent. A typical example is the representation of a 
vertical edge. It will be visible in the lowpass image and appears in every band 
that contains horizontal highpass filtering. It has thus been suggested to use vector 
quantization across the bands instead of in the bands [329, 332]. While there is 
some gain in doing so, there is also the following problem: Because the subbands are 
downsampled versions of the original, we have a shift-variant system. Thus, small 
shifts can produce changes in the subband signals which reduce the correlation. 
That is, while visually the edge is "preserved", the exact values in the various 
bands depend strongly on the location and are thus difficult to predict from band 
to band. In Section 7.3.4, we will see schemes which, by using an approach that 
does not rely on vector quantization but simply on local energy, can make use of 
some dependence between bands. 

It should be noted that the straightforward vector quantization across bands 
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Figure 7.22 Vector quantization across the bands in subband decomposition, 
(a) Uniform decomposition, (b) Octave-band, or, wavelet decomposition. Note 
that the number of samples in the various bands corresponds to a fixed region 
of the input signal. 



is easiest when equal-size subbands are used. In the case of an octave-band de- 
composition, the vector should use pixels at each level that correspond to the same 
region of the original signal. That is, the number of pixels should be inversely pro- 
portional to scale. The comparison of vector quantization for equally-spaced bands 
and octave-spaced bands is shown in Figure 7.22 for the one-dimensional case for 
simplicity. 

Bit Allocation For bit allocation between the bands, one can directly use the 
procedures developed in Section 7.1.2, at least if the filters are orthogonal. Then, 
the total distortion is the sum of the subbands distortions, and the total rate is the 
sum of rates for the various bands. In the nonorthogonal case, the distortion is not 
additive, but can be approximated as such. 

The typical allocation problem is the following: For each channel i, one has a 
choice from a set of quantizers {qi,j}- Choosing a given quantizer q^j will produce a 
distortion dij and a rate r^j for channel i (one can use weighted distortion as well). 
The problem is to find which combination of quantizers in the various channels will 
produce the minimum squared error while satisfying the budget constraint. The 
optimal solution is found using the constant-slope solution as described in Section 
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Table 7.4 Variances in the various bands of 
a uniform decomposition (denned as in Fig- 
ure 7.23). 





LL 


LH 


HH 


HL 


HL 


0.58959 


0.86237 


1.77899 


0.88081 


HH 


2.87483 


6.71625 


8.56729 


3.25402 


LH 


23.5474 


33.4055 


60.9195 


14.8490 


LL 


2711.45 


56.0058 


52.5202 


13.9685 



7.1.2. The pairs {dij,rij), that is, the operational rate-distortion curves can be 
measured over a set of representative images and then used as a fixed allocation. 
The problem is that, when applied to a particular image, the budget might not 
be met. On the other hand, given an image to be coded, one can measure the 
operational rate-distortion curves and use the constant-slope allocation procedure. 
This will guarantee an optimal solution, but is computationally expensive. Finally, 
one can use allocations based on probability density functions, in which case it is 
often sufficient to measure the variance of a particular channel in order to find its 
allocation (see (7.1.19) for example). Note that the rates used in the allocation 
procedure are after entropy coding. 

Entropy Coding Substantial reductions in rate, especially in the case of uniform 
quantizers, is obtained by entropy coding quantized samples or groups of samples. 
Any of the techniques discussed in Section 7.1.3 can be used, such as Huffman 
coding. Since Huffman codes are only within one bit of the true entropy [109], they 
tend to be inefficient for small alphabets. Thus, codewords from small alphabets 
are gathered into groups and vector Huffman coded (see [285]). Another option is to 
use vector quantization to group samples [256] . Because higher bands tend to have 
large amounts of zeros (especially after deadzone quantizers), run- length coding 
and an end of block symbol can be used to increase compression substantially. 

Examples Two typical coding examples will be described in some detail. The first 
is a uniform separable decomposition. The second is an octave-band or constant 
relative bandwidth decomposition (often called a wavelet decomposition). 

Uniform decomposition By using a separable decomposition into four bands and 
iterating it once, we obtain 16 subbands as shown in Figure 7.23. The resulting 
subband images are shown in Figure 7.24. The filters used are linear phase length- 
12 QMF's [144] and the image was symmetrically extended before filtering. The 
variances of the samples in the bands are shown in Table 7.4. We code the lowest 
subband (LL,LL) with JPEG (see Section 7.3.1). For all other bands, we use 
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Figure 7.23 Uniform subband decomposition of an image into 16 subbands. 
The spectral decomposition and ordering of the channels is shown. The first 
two letters correspond to horizontal filtering and the last two to vertical fil- 
tering. LH, for example, means that a lowpass is used in the first stage and 
a highpass in the second. The ordering is such that frequencies increase from 
left to right and from bottom to top. 




Figure 7.24 Uniform subband decomposition of the Barbara image. The or- 
dering of the subbands is given in Figure 7.23. 



uniform quantization with a dead zone of twice the step size used elsewhere. Using 
a set of step sizes, one can derive rate-distortion curves by measuring the entropy 
of the resulting quantized channels. A true operational rate-distortion curve would 
have to include run-length coding and actual entropy coding. Based on these rate- 
distortion curves, one can perform an optimal constant-slope bit allocation, that 
is, one can choose the optimal quantizer step sizes for the various bands. The step 
sizes for a budget of 0.5 bits/pixel are listed in Table 7.5. A set of Huffman codes 
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Table 7.5 Step sizes for the quantiz- 
ers in the various bands (as defined in 
Figure 7.23), for a target rate of 0.5 
bits/pixel. The lowest band was JPEG 
coded, and the step size corresponds to 
the quality factor (QF) used in JPEG. 





LL 


LH 


HH 


HL 


HL 


9.348 


8.246 


8.657 


22.318 


HH 


8.400 


10.161 


8.887 


13.243 


LH 


6.552 


7.171 


10.805 


16.512 


LL 


QF-89 


8.673 


11.209 


15.846 



L,H 


H,H 


LL.LH 


LH.LH 


H,L 


LLL, 
LLH 


LLH, 
LLH 


LH.LL 


LLL, 
LLL 


LLH. 
LLL 



Figure 7.25 Octave-band or wavelet decomposition of an image into unequal 
subbands. The spectral decomposition and ordering of the channels is shown. 



and run-length codes are designed for each subband channel. Note that the special 
symbol "start of run" (SR) is entropy coded as any other nonzero pixel. Altogether, 
one obtains the final rate of 0.497 bits/pixel (the difference in rate conies from the 
fact that bit allocation was based on entropy measures). Then, the coded image 
has SNRp of 30.38 dB. Figure 7.27 (top row) shows the compressed Barbara image 
and a detail at the same rate. 



Octave-band decomposition Instead of uniformly decomposing the spectrum of the 
image, we iterate a separable four-band decomposition three times. The resulting 
split of the spectrum is shown in Figure 7.25, together with the subband images 
in Figure 7.26. Here, we used the Daubechies' maximally flat orthogonal filters 
of length 8. At the boundaries, we used periodic extension. The variances in the 
bands are shown in Table 7.6. Histograms of pixel values of the bands are similar 
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Figure 7.26 Subband images corresponding to the spectral decomposition 
shown in Figure 7.25. 



Table 7.6 Variances in the different 
bands of an octave-band decomposi- 
tion (defined as in Figure 7.25). 



Band 


Variance 


LLL,LLL 


2559.8 


LLH,LLL 


60.7 


LLL,LLH 


43.8 


LLH,LLH 


21.2 


LH,LL 


55.4 


LL,LH 


24.5 


LH,LH 


33.7 


H,L 


141.4 


L,H 


15.2 


H,H 


16.2 



Table 7.7 Step sizes for uniform quan- 
tizer in the octave subband or wave- 
let decomposition of Figure 7.25, for a 
target rate of 0.5 bits/pixel. 



Band 


Step size 


LLL,LLL 


5.21 


LLH,LLL 


3.69 


LLL,LLH 


4.42 


LLH,LLH 


4.08 


LH,LL 


8.42 


LL,LH 


9.22 


LH,LH 


7.45 


H,L 


17.23 


L,H 


22.05 


H,H 


21.57 



to the ones in a uniform decomposition. Because the lowest band (LLL, LLL) is 
small enough (64 x 64 pixels), we use scalar quantization on it as on all other bands. 
Again, uniform quantizers with double-sized dead zone are used and rate-distortion 
curves are derived for bit-allocation purposes. The resulting step sizes for the target 
bit rate of 0.5 bits/pixel are given in Table 7.7. 

The development of entropy coding (including run-length coding for higher 
bands) is similar to the uniform- decomposition case discussed earlier. The final 
rate is 0.499 bits/pixel, with SNR P of 29.21 dB. The coded image and a detail are 
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Figure 7.27 Compression results on Barbara image. Top left: Subband coding 
in 16 uniform bands at 0.4969 bits/pixel and SNR p = 30.38 dB. Top right: 
Detail of top left. Bottom left: Octave-band or wavelet compression at 0.4990 
bits/pixel and SNR p = 29.21 dB. Bottom right: Detail of bottom left. 



shown in Figure 7.27 (bottom row). Note that there is little difference between the 
uniform and the octave-band decomposition results. 

We would like to emphasize that the above examples are "textbook examples" 
for illustration purposes. For example, no statistics over large sets of images were 
taken and thus, the entropy coders might perform poorly for a substantially different 
image. The aim was more to demonstrate the ingredients used in a subband/wavelet 
image coder. 
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State of the art coders, which can be found in the current literature, improve 
substantially the results shown here. Major differences with respect to the simple 
coders we discussed so far are the following: 

(a) Vector quantization can be used in the subbands, such as lattice vector quan- 
tization [13]. 

(b) Adaptive entropy coding is used to achieve immunity to changes in image statis- 
tics. 

(c) Adaptive quantization in the subbands can take care of busy versus nonbusy 
regions. 

(d) Dependencies across scales, either by vector quantization or prediction of 
structures across scales, are used to reduce the bit rate [176, 222, 259]. 

(e) Perceptual tuning using band sensitivity, background luminance level and mask- 
ing of noise due to high activity can improve the visual quality [252] . 

The last point — perceptual models for subband compression, is where most gain 
can be obtained. 

With these various fine tunings, good image quality for a compressed version 
of a 512 x 512 original image such as Barbara can be obtained in the range of 0.25 
to 0.5 bits/pixel. Note that the complexity level is still of the same order as the 
coders we presented and is comparable in order of magnitude to a DCT coder such 
as JPEG. 

7.3.4 Advanced Methods in Subband and Wavelet Compression 

The discussion so far has focused on standard methods. Below, we describe some 
more recent algorithms which are both of theoretical and practical interest. 

Zero-Tree Based Compression From looking at subband pictures such as those 
in Figures 7.24 or 7.26, it is clear that there are some dependencies left among 
the bands, as well as within the bands. Also, for natural images with decaying 
spectrums, it is unlikely to find significant high-frequency energy if there is little 
low-frequency energy in the same spatial location. These observations lead to the 
development of an entropy coding method specifically tailored to octave-band or 
wavelet coding. It is based on a data structure called a zero tree [176, 260], which 
is the analogous to zig-zag scanning and the end of block (EOB) symbol used in 
the DCT. 

The idea is to define a tree of zero symbols which starts at a root which is also 
zero. Therefore, this root can be labeled as an "end of block". A few such zero 
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Figure 7.28 Zero-tree structure on an octave-band decomposed image. Three 
possible trees in different bands are shown. 



trees are shown in Figure 7.28. Because the tree grows as powers of four, a zero 
tree allows us to disregard many insignificant symbols at once. Note also that a 
zero tree gathers coefficients that correspond to the same spatial location in the 
original image. 

Zero trees have been combined with bit plane coding in an elegant and efficient 
compression algorithm due to Shapiro [260, 259]. It incorporates nicely many of 
the key ideas presented in this section and demonstrates the effectiveness of wavelet 
based coding. The resulting algorithm is called embedded zero-tree wavelet (EZW) 
algorithm. Embedded means that the encoder can stop encoding at any desired 
target rate. Similarly, the decoder can stop decoding at any point resulting in the 
image that would have been produced at the rate of the truncated bit stream. This 
compression method produces excellent results without requiring a priori knowledge 
of the image source, without prestored tables of codebooks, and without training. 

The EZW algorithm uses the discrete-time wavelet transform decomposition 
where at each level i the lowest band is split into four more bands: LLj + i, LHi + i, 
HLi + i, and HHi + \. In simulations in [260], six levels are used with length-9 sym- 
metric filters given in [1]. 

The second important ingredient is that the absence of significance across scales 
is predicted by exploiting self-similarity inherent in images. A coefficient x is called 
insignificant with respect to a given threshold T, if |x| < T. The assumption is that 
if x is insignificant, then all of its descendents of the same orientation in the same 
spatial location at all finer scales are insignificant as well. We call a coefficient at 
a coarse scale a parent. All coefficients at the next finer scale at the same spatial 
location and of similar orientation are children. All coefficients at all finer scales 
at the same spatial location and of similar orientation are descendents. Although 
there exist counterexamples to the above assumption, it holds true most of the 
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time. Then, one can make use of it, and code such a parent as a zero-tree root 
(ZTR), thereby avoiding to code all its descendants. When the assumption is not 
true, that is, the parent is insignificant but down the tree, there exists a significant 
descendant, then such a parent will be coded as an isolated zero (IZ). To code the 
coefficients, Shapiro uses four symbols, ZTR, IZ, POS for a positive significant 
coefficient, and NEG for a negative significant one. In the highest bands which 
do not have any children, IZ and ZTR are merged into a zero symbol (Z). The 
order in which the coefficients are scanned is of importance as well. It is performed 
so that no child is scanned before its parent. Thus, one scans bands LLjv, HLn, 
LHjy, HHn, and moves on to the scale (N — 1) scanning HL^-i, LH^-i, HHn-i, 
until reaching the starting scale HL\, LH\, HH\. This scanning pattern orders the 
coefficients in the order of importance, allowing for embedding. 

The next step is successive approximation quantization. It entails keeping at 
all times two lists: the dominant list and the subordinate list. The dominant list 
contains the coordinates of those coefficients that have not yet been found to be 
significant. The subordinate list contains the magnitudes of those coefficients that 
have been found to be significant. The process is as follows: We decide on the initial 
threshold To, (for example, it could be half of the positive range of the coefficients) 
and start with the dominant pass where we evaluate each coefficient in the scanning 
order described above to be one of the four symbols ZTR, IZ, POS and NEG. 
Then we cut the threshold in half obtaining T\ and add another bit of precision 
to the magnitudes on the list of coefficients known to be significant, that is, the 
subordinate list. More precisely, we assign the symbols and 1 depending whether 
the refinement leaves the reconstruction of a coefficient in the upper or lower half 
of the previous bin. We reorder the coefficients in the decreasing order and go onto 
the dominant pass again with the threshold T\. Note that now those coefficients 
that have been found to be significant during a previous pass are set to zero so that 
they do not preclude a possibility of finding a zero tree. The process then alternates 
between these two passes until some stopping condition is met, such as that the 
bit budget is exhausted. Finally, the symbols are losslessly encoded using adaptive 
arithmetic coding. 

Example 7.2 EZW Example from [260] 

Let us consider a simple example given in [260]. We assume that we are given an 8 x 8 
image whose 3-level discrete-time wavelet transform is given in Table 7.8. Since the largest 
coefficient is 63, the initial threshold is To = 32. 

We start in the scanning order as we explained before. 63 is larger than 32 and thus 
gets POS. —34 is larger than 32 in absolute value and gets NEG. We go onto —31 which is 
smaller in absolute value than 32. However, going through its tree, which consists of bands 
LH2 and LH\ , we see that it is not a root of a zero tree due to a large value of 47. Therefore 
its assigned symbol is IZ. We continue with 23 and establish that it is a root of a zero tree 
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Table 7.8 An example of a 3-level 
discrete-time wavelet transform of an 
8x8 image. 
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-2 
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-5 


9 


-1 
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-2 
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15 


14 
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5 

3 
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4 

13 


3 

6 
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-1 
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23 


14 
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-13 
10 


63 


-34 



Table 7.9 The first dominant pass through the 
coefficients. 



Subband 


Coefficient 


Symbol 


Reconstruction 


LL 3 


63 


POS 


48 


HL 3 


-34 


NEG 


-48 


LH 3 


-31 


IZ 





HH 3 


23 


ZTR 





HL 2 


49 


POS 


48 


HL 2 


10 


ZTR 





HL 2 


14 


ZTR 





HL 2 


-13 


ZTR 





LH 2 


15 


ZTR 





LH 2 


14 


IZ 





LH 2 


-9 


ZTR 





LH 2 


-7 


ZTR 





HL X 


7 


Z 





HL l 


13 


Z 





HL l 


3 


Z 





HL l 


4 


Z 





LHx 


-1 


Z 





LHx 


47 


POS 


48 


LHx 


-3 


Z 





LHx 


-2 


Z 






comprising bands HH2 and HH3. We continue the process in the scanning order, except 
that we skip all those coefficients for which we have previously established that they belong 
to a zero tree. The result of this procedure is given in Table 7.9. 

After we have scanned all available coefficients, we are ready to go onto the first 
subordinate pass. We commence by halving the threshold, to obtain T\ — 16 as well 
as quantization intervals. The resulting intervals are now [32,48) and [48,64). The first 
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significant value, 63, obtains a 1, and is reconstructed to 56. The second one, —34, gets 
a and is reconstructed to —40, 49 gets a 1 and is reconstructed to 56, and finally, 47 
gets a and is reconstructed to 40. We then order these values in the decreasing order of 
reconstructed values, that is, (63,49,34,47). If we want to continue the process, we start 
the second dominant pass with the threshold of 16. We first set all significant values from 
the previous pass to zero, in order to be able to identify zero trees. In this pass, we establish 
that —31 in LHs is NEG and 23 in HH3 is POS. All the other coefficients are then found 
to be either zero tree roots or zeros. We add to the list of significant coefficients 31 and 23 
and halve the quantization intervals, to obtain, [16,24), [24,32), [32,40), [40,48), [48,56), 
and [56,64). At the end of this pass, the revised list is (63,49,47,34,31,23), while the 
reconstructed list is (60, 52, 44, 36, 28, 20). This process continues until, for example, the bit 
budget is met. 

Adaptive Decomposition Methods In our discussions of subband and wavelet 
coding of images, we have seen that both full-tree decompositions and octave-band 
tree decompositions are used. A natural question is: Why not use arbitrary binary- 
tree decompositions, and in particular, choose the best binary tree for a given 
image? This is exactly what the best basis algorithm of Coifman, Meyer, Quake and 
Wickerhauser [62, 64] attempts. Start with a collection of bases given by all binary 
subband coding trees of a given depth, called wavelet packets (see Section 3.3.4). 
From a full tree, the best basis algorithm uses dynamic programming to prune back 
to the best tree, or equivalently, the best basis. 

In [233], the best basis algorithm was modified so as to be optimal in an oper- 
ational rate-distortion sense, that is, for compression. Assume we choose a certain 
tree depth K, and for each node of the tree, a set of quantizers. Thus, given an in- 
put signal, we can evaluate an operational rate-distortion curve for each node of the 
binary tree. Then, we can prune the full tree based on operational rate distortion. 
Specifically, we introduce a Lagrange multiplier A (as we did in bit allocation, see 
Section 7.1.2) and compute a cost L(X) = D + XR for a root r and its two children 
c\ and C2- This is done at points of constant slope —A. Then, if 

L r (X) < L C1 (A) + L C2 (A), 

we can prune the children and keep the root, otherwise, we keep the children. The 
comparison is made at constant-slope points (of slope A) on the respective rate- 
distortion curves. Going up the tree in this fashion will result in an optimal binary 
tree for the image to be compressed. Note that in order to apply the Lagrange 
method, we assumed independence of the nodes, an assumption that might be 
violated (especially for deep trees). 

An extension of this idea consists of considering not only frequency divisions 
(obtained by a subband decomposition) but also splitting of the signal in time, 
so that different wavelet packets can be used for different portions of the time- 
domain signal (see also Figure 3.13). This is particularly useful if the signal is 
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Figure 7.29 Simultaneous space and frequency splitting of the Barbara image 
using the double-tree algorithm. Black lines correspond to spatial segmenta- 
tions, while white lines correspond to frequency splits. 



nonstationary. The solution consists in jointly splitting in time and frequency 
using a double-tree algorithm [132, 230] (one tree for frequency and another for 
time splitting). Using dynamic programming and an operational rate-distortion 
criterion, one can obtain best time and frequency splittings. This algorithm was 
applied to image compression in [15] . An example of space and frequency splitting 
of the Barbara image is shown in Figure 7.29, showing that large regions with 
similar characteristics are gathered into blocks, while busy regions get split into 
many smaller blocks. Over each of these blocks, a specific wavelet packet is used. 

Methods Based on Wavelet Maximums Since edges are critical to image percep- 
tion [168], there is a strong motivation to find a compression scheme that contains 
edges as critical information. This is done in Mallat and Zhong's algorithm [184] 
which is based on wavelet maximums representations. The idea is to decompose 
the image using a redundant representation which approximates the continuous 
wavelet transform at scales which are powers of two. This can be done using non- 
downsampled octave-band filter banks. Because there is no downsampling, the 
decomposition is shift-invariant. If the highpass filter is designed as an edge de- 
tector (such as the derivative of a Gaussian), then we will have edges represented 
at all scales by some local maximums or minimums. Because the representation is 
redundant, keeping only these maximums /minimums still allows good reconstruc- 
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tion of the original using an iterative procedure (based on alternating projections 
onto convex sets [29, 70, 184]). While this is an interesting approach, it turns out 
that coding the edges is expensive. Also, textures are not easily represented and 
need separate treatment. Finally, the computational burden, even for reconstruc- 
tion only, is heavy due to the iterative algorithm involved. Thus, such an approach 
needs further research in order to fully assess its potential as an image compression 
method. 

Quantization Error Analysis in a Subband System In compression schemes 
we have seen so far, the approach has been to first design the linear transform 
and then find the best quantization and entropy coding strategies possible. The 
problem of analyzing the system as a whole, although of significant theoretical and 
practical importance, has not been addressed by many authors. One of the few 
works on the topic is due to Westerink, Biemond and Boekee [331]. The authors 
use the optimal scalar quantizer to quantize the subbands — Lloyd-Max. For that 
particular quantizer, it can be shown that (see, for example, [143]) 

a y = a x ~ ° q -> (7.3.2) 

where a^a^dy are the variances of the quantization error, the input and output 
signals, respectively. Consider now a so-called "gain plus additive noise" linear 
model for this quantizer. Its input/output relationship is given by 

y = ax + r 

where x, y are the input/output of the quantizer, 7 r is the additive noise term, and 
a is the gain factor (a < 1). The main advantage of this model is that, by choosing 

a = 1 1, (7.3.3) 

°i 

the additive noise will not be correlated with the signal and (7.3.2) will hold. In 
other words, to fit the model to our given quantizer, (7.3.3) must be satisfied. Note 
also, that the additive noise term is not correlated with the output signal. 

The authors in [331] then incorporate this model into a QMF system (where the 
filters are designed to cancel aliasing, as given in (3.2.34-3.2.35)). That is, each of 
the two channel signals are quantized, use a gain factor on, and generate an additive 
noise t*j. Consequently, the error at the output of the system can be written as the 
sum of the error terms 

E(z) = E Q {z) + E s (z) + E A {z) + E R {z), 



7 Bold letters denote random variables. 
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where 



Eq{z) -- 


- l -[H\z) - H\-z) - 2] X(z), 


E S (z) = 


= l[(a - l)H 2 {z) - (ai - l)H 2 {-z)\ X(z), 


E A (z) = 


= ^(a -a 1 )H(z)H(-z)X(-z), 


Er(z) = 


= H(z)R (z 2 ) - H{-z)R l {z 2 ). 



Note that here, z 2 in Ri{z 2 ) appears since the noise component passes through the 
upsampler. This breakdown into different types of errors allows one to investigate 
their influence and severity. Here, Eq denotes the QMF (lack of perfect reconstruc- 
tion) error, Eg is the signal error (term with X(z)), Ea is the aliasing error (term 
with X(—z)), and Er is the random error. Note that only the random error En 
is uncorrelated with the signal. The QMF error is insignificant and can be disre- 
garded. Aliasing errors become negligible if filters of length f 2 or more are used. 
Finally, the signal error determines the sharpness while the random error is most 
visible in flat areas of the image. 

Joint Design of Quantization and Filtering in a Subband System Let us now 

extend the idea from the previous section into more general subband systems. The 
surprising result is that by changing the synthesis filter bank according to the quan- 
tizer used, one can cancel all signal- dependent errors [161]. In other words, the re- 
constructed signal error will be of only one type, that is, random error, uncorrelated 
to the signal. 

The idea is to use a general subband system with Lloyd-Max quantization and 
see whether one can eliminate certain types of errors. Note that here, no assump- 
tions are made about the filters, that is, filters (Hq, Hi) and (Go, G\) do not consti- 
tute a perfect reconstruction pair. Assume, however, that given (Hq,H\), we find 
(Tq,Ti) such that the system is perfect reconstruction. Then, it can be shown that 
if the synthesis filters are chosen as 

G (z) = —T (z), G 1 (z) = — Ti(z), 

Oo Oil 

where a» are the gain factors of the quantizer models, all errors depending on X{z) 
and X{—z) are cancelled and the only remaining error is the random error 

E(z) = E R {z) = —T (z)R (z 2 ) + —T 1 (z)R 1 (z 2 ), 

Oo OL\ 

where Ri{z) are the noise terms appearing in the linear model. In other words, 
by appropriate choice of synthesis filters, the only remaining error is uncorrelated 
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to the signal. The potential benefit of this approach is that one has to deal only 
with a random, noise- like error at the output, which can then be alleviated with 
an appropriate noise removal technique. Note, however, that the random error has 
been boosted by dividing the terms by aij < 1. For more details, see [161]. 

Nonorthogonal Subband Coding Most of the subband coding literature uses 
orthogonal filters, since otherwise the squared norm of the quantization error would 
not be preserved leading to a possibly large reconstruction error. If nonorthogonal 
transforms are used, they are usually very close to the orthogonal ones [14]. 

Moulin in [200] shows that the fact that nonorthogonal transforms do not per- 
form well when compared to orthogonal ones, is due to an inappropriate formulation 
of the coding problem, rather than to the use of the nonorthogonal transform itself. 

Let us recall how the usual subband decomposition/reconstruction is performed. 
We have an image x, going through the analysis stage H, to produce subband 
images 

y = Hx. 

The next step is to compute a quantized image y, 

y = Q(v). 

Finally, we reconstruct the image as 

x = Gy, 

where the system is perfect or near-perfect reconstruction. Moulin, instead, suggests 
the following: Find y that minimizes the squared error at the output 

E(y op t) — \\Gy op t — x \\ i 

where y opt belongs to the set of all possible quantized images. Due to this con- 
straint, the problem becomes a discrete optimization problem and is solved using a 
numerical relaxation algorithm. Experiments on images show significant visual as 
well as MSE improvement. For more details, refer to [200]. 

7.4 Video Compression 

Digital video compression has emerged as an area of intense research and devel- 
opment activity recently. This is due to the demand for new video services such 
as high- definition television, the maturity of the compression techniques, and the 
availability of technology to implement state of the art coders at reasonable costs. 
Besides the large number of research papers on video compression, good examples 
of the increased activity in the field are the standardization efforts such as MPEG 
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[173, 201] (the Moving Pictures Experts Group of the International Standardiza- 
tions Organization). While the video compression problem is quite different from 
straight image coding, mainly because of the presence of motion, techniques suc- 
cessful with images are often part of video coding algorithms as well. That is, signal 
expansion methods are an integral part of most video coding algorithms and are 
used in conjunction with motion based techniques. 

This section will discuss both signal expansion and motion based methods used 
for moving images. We start by describing the key problems in video compression, 
one of which is compatibility between standards of various resolutions and has a 
natural answer in multiresolution coding techniques. Standard motion compensated 
video compression is described next, as well as the use of transforms for coding 
the prediction error signal. Then, pyramid coding of video, which attempts to 
get the best of subband and motion based techniques, is discussed. Subband or 
wavelet decomposition techniques in three dimensions are presented, indicating both 
their usefulness and their shortcomings. Finally, the emerging MPEG standard is 
discussed. 

Note that by intraframe coding we will denote video coding techniques where 
each frame is coded separately. On the other hand, interframe coding will mean 
that we take the time dimension and the correlation between frames into account. 

7.4.1 Key Problems in Video Compression 

Video is a sequence of images, that is, a three-dimensional signal. A number of 
key features distinguishes video compression from being just a multidimensional 
extension of previously discussed compression methods. Moreover, the data rates 
are several orders of magnitude higher than those in speech and audio (for exam- 
ple, digital standard television uses more than 200 Mbits/sec, and high- definition 
television more than 1 Gbits/sec). 

Motion Models in Video The presence of structures related to motion in the 
video signal indicates ways to achieve high compression by using model based pro- 
cessing. That is, instead of looking at the three-dimensional video signal as simply 
a sequence of images, one knows that very often, future images can be deduced 
from the past ones by some simple transformation such as translation. This is 
shown schematically in Figure 7.30, where two objects appear in front of a uniform 
background, one being still (no motion) and the other moving (simple, translational 
motion). It is clear that a compact description of this scene can be obtained by de- 
scribing the first image and then indicating only how the objects move in subsequent 
images. It turns out that most video scenes are well described by such motion mod- 
els of objects, as well as global modifications such as zooms and pans. Of course, a 
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Figure 7.30 Moving objects in a video sequence. One object is still 
motion, whereas the other has a purely translational motion. 



number of problems have to be addressed such as occlusion or uncovering of back- 
ground due to an object's movement. Overall, the motion based approaches in video 
processing have been very successful [207]. Note that motion is an "image- domain" 
phenomenon, since we are looking for displacements of image features. Thus, many 
of the motion estimation algorithms are of a correlative nature. An example is the 
block matching algorithm, which searches for local correlation maximums between 
successive images. 



A Transform-Domain View Assume the following simplified view of video: a sin- 
gle object has a translational motion in front of a black background. One can verify 
that the three-dimensional Fourier transform is zero except on a plane orthogonal 
to the motion vector and passing through the origin. The values on the plane are 
equal to the two-dimensional Fourier transform of the object. That is, motion sim- 
ply tilts the Fourier transform of a still object. It seems therefore attractive to code 
the moving object in Fourier space, where the coding would reduce to coding of the 
object's Fourier transform and the direction of the plane. This idealized view has 
lead to various proposals for video coding which would first include an appropriate 
transform domain approximating Fourier space (such as a subband division) and 
then locate the region where the energy is mostly concentrated (corresponding to 
the tilted plane of the object). It would then disregard other Fourier components 
to achieve compression. While such an approach seems attractive at first sight, it 
has some shortcomings. 

First, real video scenes do not match the model. The background, which has 
an "untilted" Fourier transform, gets covered and uncovered by the moving object, 
creating spurious frequencies. Then, there are usually several moving objects with 
different motions, thus several tilted planes would be necessary. Finally, most 
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of the transforms proposed (such as TV-band subband division where TV is not a 
large integer for complexity reasons) partition the spectrum coarsely and thus, 
they cannot approximate the tilted plane very well. 

Since coding the spectrum requires coding of one image (or its two-dimensional 
spectrum) plus the direction of the tilted plane, staying in the sequence domain will 
perform just as well. Note also that motion is easier to analyze in the image plane 
rather than the Fourier domain. The argument is simple; compare two images where 
an object has moved. In the image plane, it is a localized phenomenon described 
by a single motion vector, while in spectral domain, it results in a different phase 
shift of every Fourier component. 

The Perceptual Point Of View Just as in coding of speech or images, the ultimate 
judge of quality is the human observer. Therefore, spatio-temporal models of the 
human visual system (HVS) are important. These turn out to be more complex 
than for static images, especially because of spatio-temporal masking phenomena 
related to motion. If one considers sensitivity to spatio-temporal gratings (sinusoids 
with an offset and various frequencies in all three dimensions), then the eye has 
a lowpass/bandpass characteristic [207]. The sensitivity is maximum at medium 
spatial and temporal frequencies, falls off slightly at low frequencies, and falls off 
rapidly toward high frequencies (note that the sensitivity function is not separable 
in space and time). Finally, sinusoids separated by more than an octave in spatial 
frequency are treated in an independent manner. 

Masking does occur, but it is a very local effect and cannot be well modeled in 
the frequency domain. This masking is both spatial (reduced sensitivity at sharp 
transitions) and temporal (reduced sensitivity at scene changes). The perception of 
motion is a complex phenomenon and psychophysical results are only starting to be 
applicable to coding. One effect is clear and intuitive however: The perception of a 
moving object depends on if it is tracked by the eye or not. While in the latter case, 
the object could be blurred without noticeable effect, in the former, the object will 
be perceived as accurately as if it were still. Since it cannot be predicted if the viewer 
will or will not follow the object, one cannot increase compression of moving objects 
by blurring them. This somewhat naive approach has sometimes been suggested 
in conjunction with three-dimensional frequency-domain coding methods, but does 
not work, since more often than not, the interest of the viewer is in the moving 
object. 

Progressive and Interlaced Scanning When thinking of sampling a three-di- 
mensional signal, the most natural sampling lattice seems to be the rectangular 
lattice, as shown in Figure 7.31(a). The scanning corresponding to this lattice 
is called progressive scanning in television cameras and displays. However, for 
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Figure 7.31 Scanning modes used in television, (a) Progressive scanning, 
which corresponds to the ordinary rectangular lattice, (b) Interlaced scan- 
ning, which samples alternately even and odd lines. It corresponds to the 
quincunx lattice in the (vertical, time)-plane. (c) Face centered orthorhombic 
(FCO) lattice, which is the true three-dimensional downsampling by two of the 
rectangular lattice. 



historical and technological reasons, a different sampling called interlaced scanning 
is often used. It corresponds to a quincunx lattice in the (vertical, time)-plane and 
its shifted versions along the horizontal axis, as shown in Figure 7.31(b). The name 
interlaced comes from the fact that even and odd lines are scanned alternately. A 
set of even or odd lines is called a field, and two successive fields form a frame. 

While interlacing complicates a number of signal processing tasks such as mo- 
tion estimation, it represents an interesting compromise between space and time 
resolutions for a given number of sampling points in a space-time volume. Typi- 
cally, high frequencies in both vertical and time dimensions cannot be represented, 
but this loss in resolution is not very noticeable. Progressive scanning would have 
to reduce the sampling rate by two in either dimension in Figure 7.31(a) to achieve 
the same density as in Figure 7.31(b), which is more noticeable than to resort to 
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interlacing. 

An even better compromise would be obtained with the face-centered orthorhom- 
bic (FCO) lattice [164], which is the true generalization of the two-dimensional 
quincunx lattice to three dimensions (see Figure 7.31(c)). Then, only frequencies 
which are high in all three dimensions simultaneously are lost, and these are not well 
perceived anyway. However, for technological reasons, FCO is less attractive than 
interlaced scanning. Of course, in the various sampling schemes discussed above, 
one can always construct counter examples that lose resolution, in particular when 
tracked by the human observer (for example, objects with high frequency patterns 
moving in a worst case direction). However, these counter examples are unlikely in 
real world imagery, particularly for interlaced and even more for FCO scanning. 8 

Compatibility In three-dimensional imagery such as television and movies, the 
issue of compatibility between various standards, or at least easy transcoding, has 
become a central issue. For many years, progressive scanning used in movies and 
interlaced scanning used in television and video had an uneasy coexistence, just as 
the 50 Hz frame rate for television in Europe versus 60 Hz frame rate for television in 
US and Japan. Some ad hoc techniques were used to transcode from one standard 
to another, such as the so-called 2/3 pull-down to go from 24 Hz progressively 
scanned movies to 60 Hz interlaced video. 

The advent of digital television with its potential for higher quality, as well as 
the development of new formats (usually referred to as high definition television or, 
HDTV) has pushed compatibility to the forefront of current concerns. 

Conceptually, multiresolution techniques form an adequate framework to deal 
with compatibility issues [323]. For example, standard television can be seen as a 
subresolution of high definition television (although this is a very rough approxima- 
tion), but with added problems such as different aspect ratios (the ratio of width 
and height of the picture). However, there are two basic problems which make the 
problem difficult: 

Sublattice property Unless the lower-resolution scanning standard is a sublattice of 
the higher-resolution one, it cannot be used directly as a subresolution signal in 
a multiresolution scheme such as a subband coder. Consider the following two 
examples in Figure 7.32. 

First, take as full resolution a 1024 x 1024 progressive sequence at 60 Hz, with a 
512 x 512 interlaced sequence at 60 Hz as subresolution (note that 60 Hz is the frame 
and field rate in the progressive and interlaced case, respectively). The latter exists 
on a sublattice of the former, namely, by downsampling by two in the horizontal and 



8 The famous backward turning wagon wheels in movies provide an example of aliasing in pro- 
gressive scanning which could only be avoided by blurring in time. 
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Figure 7.32 Sublattice property for compatibility (the (vertical, time)-plane 
is shown). The "•" represents the original lattice, and the squares the sparser 
lattice, (a) 1024 x 1024 progressive, 60 Hz versus 512 x 512 interlaced, 60 Hz. 
The sublattice property is verified, (b) 1024 x 1024 interlaced, 60 Hz versus 
512 x 512 interlaced, 60 Hz. The sublattice property is not verified. 



vertical dimension, followed by quincunx downsampling in the (vertical, time)-plane 
(see Figure 7.32(a)). 

The second example starts with a 1024 x 1024 interlaced sequence at 60Hz 
and one would like to obtain a 512 x 512 interlaced one at 60Hz as well (see Fig- 
ure 7.32(b)). Half of the points have to be interpolated, since the latter scanning 
is not a sublattice of the former. It can still be used as a coarse resolution in a 
pyramid coder, but cannot be used as one of the channels in subband coding. 



Compatibility as an overconstraint Sometimes, it is stated that all video services from 
videotelephone to HDTV should be embedded in one another, somewhat like Rus- 
sian dolls. That is, the whole video hierarchy can be progressively built up from the 
simplest to the most sophisticated. However, the successive refinement property is a 
constraint with a price [93] and a complete refinement property with some stringent 
bit rates requirements (for example, videotelephone at 64 Kbits/sec, standard tele- 
vision at 5 Mbits/sec and HDTV at 20 Mbits/sec) is quite constrained and might 
not lead to the best quality pictures. This is because each of the individual rates 
is a difficult target in itself, and the combination thereof can be an overconstrained 
problem. 

While we will discuss compatibility issues and use multiresolution techniques 
as a possible technique to address the problems, we want to point out that there 
is no panacea. Each case of compression with compatibility requirement has to be 
carefully addressed essentially from scratch. 
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Figure 7.33 Hybrid motion-compensated predictive DCT coding. 



7.4.2 Motion-Compensated Video Coding 



As discussed above, motion models allow a compact description of moving imagery 
and motion prediction permits high compression. Typically, a future frame is pre- 
dicted from past frames using local motion information. That is, a particular N xN 
block of the current frame to be coded is predicted as a displaced N x N block from 
the previous reconstructed frame and the prediction error is compressed using tech- 
niques such as transform coding. The decoder can construct the same prediction 
and add it to the decoded prediction error. Such a scheme is essentially an adaptive 
DPCM over the time dimension, where the predictor is based on motion estima- 
tion. Figure 7.33 shows such a scheme, which is called hybrid motion- compensated 
predictive DCT video coding and is part of several standard coding algorithms [177]. 

As can be seen in Figure 7.33, the prediction error is compressed using the DCT, 
even though there is little correlation left in the prediction error on average. 

Note also that the DCT could be replaced by another expansion such as sub- 
bands (see Figure 7.39(b)). Because of its resemblance to a standard coder, the 
approach will work. However, because motion compensation is done on a block-by- 
block basis (for example, in block matching motion compensation), there can be a 
block structure in the prediction error. Thus, choosing a DCT of the same block size 
is a natural expansion, while taking an expansion that crosses the boundaries could 
suffer from that blocking structure (which creates artificially high frequencies). It 
should not be forgotten, however, that the bulk of the compression comes from the 
motion compensation loop using accurate motion estimates and thus, replacing the 
DCT by a LOT or a discrete wavelet transform can improve the performance, but 
not dramatically. 
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7.4.3 Pyramid Coding of Video 

The difficulty of including motion in three-dimensional subband coding will be 
discussed shortly. It turns out that it is much easier to include motion in pyramid 
coding, due to the fact that the prediction or interpolation from low resolution to 
full resolution (see Figure 7.18) can be an arbitrary predictor [9], such as a motion 
based one. This is a general idea which can be used in various forms for video 
compression and we will describe a particular scheme as an example. 

This video compression scheme was studied in [301, 302, 303]. Consider a pro- 
gressive video sequence and its subresolutions, obtained by spatial filtering and 
downsampling as well as frame skipping over time. Note that filtering over time 
would create so-called "double images" when there is motion and thus straight 
downsampling in time is preferable. This is shown schematically in Figure 7.34(a), 
where the resolution is decreased by a factor of two in each dimension between 
one level of the pyramid and the next. Now we apply the classic pyramid coding 
scheme, which consists of the following: 

(a) Coding the low resolution. 

(b) Predicting the higher resolution based on the coded low resolution. 

(c) Taking the difference between the predicted and the true higher resolution, 
resulting in the prediction error. 

(d) Coding the prediction error. 

While these steps could be done in the three dimensions at once, it is preferable 
to separate the spatial and temporal dimensions. First, the spatial dimension is 
interpolated using filtering and then the temporal dimension is interpolated using 
motion-based interpolation. This is shown in Figure 7.34(b). Following each inter- 
polation step, the prediction error is computed and coded and this coded value is 
added to the prediction before going to the next step. Because at each step, we 
use coded versions for our prediction, we have a pyramid scheme with quantization 
noise feedback, as was described in Figure 7.19. Therefore, there is only one source 
of error, namely the compression of the last prediction error. 

The oversampling inherent in pyramid coding is not a problem in the three- 
dimensional case, since, following (3.5.4), we have a total number of samples which 
increases only as 

(l + - + ^ + ---)N < -N, 
v 8 8 2 ; 7 

or at most 14%, since every coarser level has only l/8th the number of samples of 
its predecessor. 
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Figure 7.34 Spatio-temporal pyramid video coding, (a) Three layers of the 
pyramid, corresponding to three resolutions, (b) Prediction of the higher res- 
olution. The spatial resolution is interpolated first (using linear filtering) and 
then the temporal resolution is increased using motion interpolation. 



The key technique in the spatio-temporal pyramid scheme is the motion interpo- 
lation step, which predicts a frame from its two neighbors based on motion vectors. 
Assume the standard rigid-object and pure translational motion model [207]. If we 
denote the intensity of a pixel at location r = (x,y) and time t by I(r,t), we are 
looking for a mapping d(r,t) such that we can write: 

I(r,t) = I(r-d(r,t),t-l). 

If motion is not changing over time, we also have: 

I(r,t) = I(r + d(r,t),t + l). 

The goal is to find the function d(r,t), that is, estimate the motion. This is 
a standard estimation procedure, where some simplifying assumptions are made 
(such as constant motion over a neighborhood). Typically, for a small block b in 
the current frame, one searches over a set of possible motion vectors such that the 
sum of squared differences, 



is minimized, where 



]T|/(r,t)-/(r,i)| 2 , 
I(r,t) = I(r-d b ,t-l), 



(7.4.1) 



(7.4.2) 
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corresponds to a block in the previous frame displaced by d b (the motion for the 
block under consideration in the current frame). It is best to actually perform a 
symmetric search by considering the past (as in (7.4.2)), the future ((7.4.2) with 
sign reversals for df,), and the average, 

I(r,t) = l[i(r-d b ,t-l) + I(r + d b ,t+l)], 

and then to choose the best match. Choosing past or future for the interpolation 
is especially important for covering and uncovering of background due to moving 
objects, as well as in case of abrupt changes (scene changes). 

Interestingly, a very successful technique to perform motion estimation (that 
is, finding the displacement d b that minimizes (7.4.1)) is based on multiresolution 
or successive approximation. Instead of solving (7.4.1) directly, one solves a coarse 
version of the same problem, refines the solution (by interpolating the motion vector 
field), and uses this new field as a starting point for a new, finer search. This is not 
only computationally less complex, but also more robust in general [31, 302]. It is 
actually a regularization of the motion estimation problem. 

As an illustration of this video coding scheme, a few representative pictures are 
shown. First, Figure 7.35 shows the successive refinement of the motion vector field, 
which starts with a sparse field on a coarse version and refines it to a fine field on the 
full-resolution image. In Figure 7.36, we show the resulting spatial and temporal 
prediction error signals. As can be seen, the spatial prediction error has higher 
energy than the temporal one, which shows that temporal interpolation based on 
motion is quite successful (actually, this sequence has high frequency spatial details, 
which cannot be well predicted from the coarse resolution). 

A point to note is that the first subresolution sequence (which is downsampled by 
2 in each dimension) is of good visual quality and could be used for a compatible 
coding scheme. This coding scheme was implemented for high quality coding of 
HDTV with a compatible subchannel and it performed well at medium compression 
(of the order of 10-15 to 1) with essentially no visible degradation [301, 303]. 

7.4.4 Subband Decompositions for Video Representation and Compression 

Decompositions for Representation We will discuss here two ways of sampling 
video by 2; the first, using quincunx sampling along (vertical, time)-dimensions and 
the second, true three-dimensional sampling by 2, using the FCO sampling lattice. 

Quincunx sampling for scanning format conversions We have outlined previously the 
existence of different scanning standards (such as interlaced and progressive) as well 
as the desire for compatibility. A simple technique to deal with these problems is 
to use perfect reconstruction filter banks to go back and forth between progressive 
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Figure 7.35 Multiresolution motion vector fields used in the interpolation. 
Each corresponds to a layer in the pyramid, with coarse (top left), medium 
(top right) and fine (bottom) resolutions. 



and interlaced scanning, as shown in Figure 7.37 [320]. This is achieved by quin- 
cunx downsampling the channels in the (vertical, time)-plane. Properly designed 
filter pairs (either orthogonal or biorthogonal solutions) lead to a lowpass channel 
that is a usable interlaced sequence, while the original sequence can be perfectly 
recovered when using both the lowpass and highpass channels in the reconstruction. 
This is a compatible solution in the following sense: A low-quality receiver would 
only decode the lowpass channel and thus show an interlaced sequence, while a 
high-quality receiver would synthesize a full resolution progressive sequence based 
on both the lowpass and the highpass channels. 
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Figure 7.36 Results of spatio-temporal coding of video (after [301]). The 
spatial (left) and temporal (right) prediction errors are shown. The recon- 
struction (not shown) is indistinguishable from the original at the rate used in 
this experiment (around 1.0 bits/pixel). 
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Figure 7.37 Progressive to interlaced conversion using a two-channel perfect 
reconstruction filter bank with quincunx downsampling. 



If one starts with an interlaced sequence, one can obtain a progressive sequence 
by quincunx downsampling. Thus, an interlaced sequence can be broken into low- 
pass and highpass progressive sequences, again allowing perfect reconstruction when 
perfect reconstruction filter banks are used. This is a very simple, linear technique 
to produce a deinterlaced sequence (the lowpass signal) as well as a helper signal 
(the highpass signal) from which to reconstruct the original signal. While more 
powerful, motion based techniques can produce better results, the above technique 
is attractive because of its low complexity and the fact that no motion model needs 
to be assumed. 
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Perfect reconstruction filter banks for these applications, in particular having 
low complexity, have been designed in [320]. Both orthogonal and biorthogonal 
solutions are given. As an example, we give the two-dimensional impulse responses 
of a simple linear phase filter pair, 



h [ni,n 2 ] 



( - 1 \ 

-2 4 -2 

-1 4 28 4 -1 

-2 4 -2 

V -i / 



hi[ni,n 2 ] = j 1 -4 



(7.4.3) 
which are lowpass and highpass filters, respectively. Since it is a biorthogonal pair, 
the synthesis filters (if the above are used for analysis) are obtained by modulation 
with (— l)( ni +™ 2 ) and thus, the roles of lowpass and highpass are reversed (see also 
Problem 7.7). 

FCO sampling for video representation We mentioned previously that using the FCO 
lattice (depicted in Figure 7.31(c)) might produce visually more pleasing sequences 
if a data reduction by two is needed. This is due in part to the fact that an ideal 
lowpass in the FCO case would retain more of the energy of the original signal than 
the corresponding quincunx lowpass filter. Actually, assuming that the original 
signal has a spherically uniform spectrum, and that the ideal lowpass filters are 
Voronoi regions both in the quincunx and the FCO cases, the quincunx lowpass 
would retain 84.3% of the original spectrum, while the FCO lowpass would retain 
95.5% of the original spectrum [164]. 

To evaluate the gain of processing a video signal with a true three-dimensional 
scheme when a data rate reduction of two is needed, we can use a two-channel 
perfect reconstruction filter bank [164]. The sampling matrix is 



Dfco 



and the perfect reconstruction filter pair is a generalization of the above diamond- 
shaped quincunx filters to three dimensions. To compare the low bands obtained 
in this manner, they are interpolated back to the original lattice, since we cannot 
observe the FCO output directly. Upon observing the result, the conclusion is that 
FCO produces visually more pleasing sequences. For more detail, see [164]. 

Three-Dimensional Subband Decomposition for Compression A straightfor- 
ward generalization of separable subband decomposition to three dimensions is 
shown in Figure 7.38, with the separable filter tree shown in part (a) and slicing 
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Figure 7.38 Three-dimensional subband decomposition of video, (a) Sepa- 
rable filter bank tree. LP and HP stand for lowpass and highpass filtering, 
respectively, and the circle indicates downsampling by two. (b) Slicing of the 
three-dimensional spectrum. 



of the spectrum given in part (b) [153]. In general, most of the energy will be 
contained in the band that has gone through lowpass filtering in all three directions 
thus iterating the decomposition on this band is most natural. This is actually 
a three-dimensional discrete-time wavelet decomposition and is used in [153, 224]. 
Such three-dimensional decompositions work best for isotropic data, such as tomo- 
graphic images used in medical imaging or multispectral images used in satellite 
imagery. In that case, the same filters can be used in each dimension, together with 
the same compression strategy (at least as a first approximation). 

As we said, in video sequences, time should be treated differently from the 
spatial dimensions. Typically, only very short filters are used along time (such as 
Haar filters given in (3.1.2) and (3.1.17)) since long filters will smear motion in the 
lowpass channel and create artificial high frequencies in the highpass channel. If 
one looks at the output of a three-dimensional subband decomposition, one can 
note that the lowpass version is similar to the original and the only other channel 
with substantial energy is the one containing a highpass filter over time followed by 
lowpass filters in the two spatial dimensions. This channel contains energy every 
time there is substantial motion and can be used as a motion indicator. 

While motion-compensated methods can outperform subband decompositions 
over time, recently, there have been some promising results [223, 286]. Also, it is 
a simple, low-complexity method and can easily be used in a joint source-channel 
coding environment because of the natural ordering in importance of the subbands 
[323]. Subband representation is also very convenient for hierarchical decomposition 
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Figure 7.39 Motion-compensated subband coding. SB: subband, ME: motion 
estimation, MC: motion compensation, MCL: motion-compensation loop, (a) 
Motion compensation of each subband. (b) Subband decomposition of the 
motion-compensated prediction error. 



and coding [35] and has been used for compression of HDTV [336] . 



Motion and Subband Coding Intuitively, instead of lowpass and highpass fil- 
tering along the time axis, one should filter along the direction of motion instead. 
Then, motion itself would not create artificial high frequencies as it does in straight 
three-dimensional subband coding. This view, although conceptually appealing, is 
difficult to translate into practice, except in very limited cases (such as panning, 
which corresponds to a single translational motion). In general, there are different 
motion trajectories as well as covering and uncovering of background by moving 
objects. Thus, subband decomposition along motion trajectories is not a practical 
approach (see [167] for further discussions on this topic). 

Instead, one has to go back to more traditional motion-compensation techniques 
and see how they fit into a subband coding framework or, conversely, how subband 
coding can be used within a motion-compensated coder [110]. Consider inclusion of 
motion compensation into a subband decomposition. That is, instead of processing 
the time axis using Haar filters, we use a motion-compensation loop in each of the 
four spatial bands. One advantage is that the four channels are now treated in an 
independent fashion. While this scheme should perform better than the straight 
three-dimensional decomposition, it also has a number of drawbacks. First, motion 
compensation requires motion estimation. If it is done in the subbands, it is less 
accurate than the motion estimates obtained from the original sequence. Also, 
motion estimation in the high frequency subbands will be difficult. Thus, motion 
estimation should probably be done on the original sequence and the estimates 
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Table 7.1 Comparison of subband and pyra- 
mid coding of video. TV is the number of 
channels in the subband decomposition and 
5 is the quantizer step size. 



Method 


Subband 


Pyramid 


Oversampling 


0% 


< 14% 


Maximum coding error 


VN5 


5 


Subchannel quality 


Limited 


Good 


Inclusion of motion 


Difficult 


Easy 


Nonlinear processing 


Difficult 


Easy 


Model-based processing 


Difficult 


Easy 


Encoding delay 


Moderate 


Large 



then used in each band after proper rescaling (see Figure 7.39(a)). One of the 
attractive features of the original scheme, namely that motion processing is done 
in parallel and at a lower resolution, is thus partly lost, since motion estimation 
is now shared. Moreover, it is hard to perform motion compensation in the high 
frequency subbands, since they mostly consist of edge information and thus slight 
motion errors lead to large prediction errors. 

As can be been from the above discussion, motion compensation in the subbands 
is not easy. An intuitive explanation is the following: motion, that is, translation of 
objects, is a sequence- domain phenomenon. Going to a subband domain is similar to 
going into frequency domain, but there, translation is a complex phenomenon, with 
different phase factors at different frequencies. This shows that motion estimation 
and compensation is more difficult in the subband domain than in the original 
sequence domain. 

Consider the alternative of using subband decomposition within a motion- com- 
pensated coder, as shown in Figure 7.39(b). The subband decomposition is used to 
decompose the prediction error signal spatially and replaces simply the DCT which 
is usually present in such a hybrid motion-compensated DCT coder. This approach 
was discussed in Section 7.4.2, where we indicated its feasibility, but also some of 
its possible shortcomings. 



Comparison of Subband and Pyramid Coding for Video Because both sub- 
band and pyramid coding of video are three-dimensional multiresolution decom- 
positions, it is natural to compare them. A slight disadvantage of pyramid over 
subband coding is the oversampling; however, it is small in this three-dimensional 
case. Also, the encoding delay is larger in pyramid coding than in subband coding. 
On all other counts, pyramid coding turns out to be advantageous when compared 
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to subband coding, a somewhat astonishing fact considering the simplicity of the 
pyramid approach. First, there is an easy control of quantization error, using the 
quantization error feedback and this leads to a tight bound on a maximum possi- 
ble error, unlike in transform or subband coders. Second, the inclusion of motion, 
which we discovered to be difficult in subband coding, is very simple in a pyrami- 
dal scheme, as demonstrated in the spatio-temporal scheme discussed previously. 
The quality of a compatible subchannel is limited in a subband scheme due to the 
constrained filters that are used. In the pyramid case, however, the freedom on 
the filters involved both before downsampling and for interpolation can be used 
to obtain visually pleasing coarse resolutions as well as good quality interpolated 
versions, a useful feature for compatibility. The above comparison is summarized 
in Table 7.10. 

7.4.5 Example: MPEG Video Compression Standard 

Just as in image compression, where several key ideas led to the JPEG standard (see 
Section 7.3.1), the work on video compression led to the development of a successful 
standard called MPEG [173, 201]. Currently, MPEG comes in two versions, namely 
a "coarse" version called MPEG-I (for noninterlaced television at 30 frames/second, 
and a compressed bit rate of the order of 1 Mbits/sec) and a "finer" version named 
MPEG-II (for 60 fields/sec regular interlaced television, and a compressed bit rate 
of 5 to 10 Mbits/sec). The principles used in both versions are very similar and 
we will concentrate on MPEG-I in the following. What makes MPEG both in- 
teresting and powerful is that it combines several of the ideas discussed in image 
and video compression earlier in this chapter. In particular, it uses both hybrid 
motion-compensated predictive DCT coding (for a subset of frames) and bidirec- 
tional motion interpolation (as was discussed in the context of video pyramids). 
But first, it segments the infinite sequence of frames into temporal blocks called 
group of pictures (GOP). A GOP typically consists of 15 frames (that is, half a 
second of video). The first frame of a GOP is coded using standard image compres- 
sion and no prediction from the past frames (this decouples the GOP from the past 
and allows one to decode a GOP independently of other GOP's). This intraframe 
coded image — I-frame, is used as the start frame of a motion-compensation loop 
which predicts every N-th frame in the GOP where N is typically two or three. 
The predicted frames (P-frames) are then used together with the I-frame in order 
to interpolate the N — 1 intermediate frames (called B-frames because the inter- 
polation is bidirectional) between the P-frames. A GOP, the various frame types, 
and their dependencies are shown in Figure 7.40. 

Both the intraframe and the various prediction errors (corresponding to the 
difference between the true frame and its prediction either from the past or from 
its neighbors in the P and B case, respectively) are compressed using a JPEG-like 
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Figure 7.40 A group of pictures (GOP) in the MPEG video coding standard. 
I, P, and B stand for intra, predicted and bidirectionally interpolated frames, 
respectively. There are nine frames in this GOP, with two B-frames between 
every P-frame. The arrows show the dependencies between frames. 



standard (DCT, quantization with an appropriate quantization matrix, and zigzag 
scanning with entropy coding). One important difference, however, is that the 
quantization matrix can be scaled by a multiplicative factor and this factor is sent 
as overhead. This allows a coarse form of adaptive quantization if desired. 

A key for good compression performance is good motion estimation/prediction. 
In particular, motion can be estimated at different accuracies (motion by integer 
pixel distances, or finer, subpixel accuracy). Of course, finer motion information 
increases the overhead to be sent to the decoder, but typically, the reduction in 
prediction error justifies this finer motion estimation and prediction. For example, 
it is common to use half-pixel accuracy motion estimation in MPEG. 



7.5 Joint Source-Channel Coding 

The source coding methods we have discussed so far are used in order to transport 
information (such as a video sequence) over a channel with limited capacity (such 
as a telephone line which can carry up to 20 Kbits/sec). In many situations, source 
coding can be performed separately from channel coding, which is known as the 
separation principle of source and channel coding. For example, in a point-to-point 
transmission using a known, time-invariant channel such as a telephone line, one 
can design the best possible channel coding method to approach channel capacity, 
that is, achieve a rate R in bits/sec such that R < C where C is the channel capacity 
[258] . Then, the task of the source compression method is to reduce the bit rate so 
as to match the rate of the channel. 

However, there exist other situations where a separation principle cannot be 
used. In particular, when the channel is time-varying and there is a delay con- 
straint, or when multiple channels are present as in broadcast or multicast, it can 
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be advantageous to jointly design the source and channel coding so that, for exam- 
ple, several transmission rates are possible. 

The development of such methods is beyond the scope of this book. As an 
example, the case of multiple channels falls into a well studied branch of informa- 
tion theory called multiuser information theory [66]. Instead, we will show sev- 
eral examples indicating how multiresolution source coding fits naturally into joint 
source-channel coding methods. In all these examples, the transmission, or channel 
coding, uses a principle we call multiresolution transmission and can be seen as the 
dual of multiresolution source coding. 

Multiresolution transmission is based on the idea that a transmission system 
can operate at different rates, depending on the channel conditions, or that certain 
bits will be better protected than others in case of adverse channel conditions. Such 
a behavior of the transmission system can be achieved using different techniques, 
depending on the transmission media. For example, unequal error protection codes 
can be used, thus making certain bits more robust than others in the case of a 
noisy channel. The combination of such a transmission scheme with a multires- 
olution source coder is very natural. The multiresolution source coder segments 
the information into a part which reconstructs a coarse, first approximation of the 
signal (such as the lowpass channel in a subband coder) as well as a part which 
gives the additional detail signal (typically, the higher frequencies). The coarse 
approximation is now sent using the highly protected bits and has a high prob- 
ability of arriving successfully, while the detail information will only arrive if the 
channel condition is good. The scheme generalizes to more levels of quality in an 
obvious manner. This intuitive matching of successive approximation of the source 
to different transmission rates, depending on the quality of the channel, is called 
multiresolution joint source- channel coding. 

7.5.1 Digital Broadcast 

As a first example, we consider digital broadcast. This is a typical instance of a 
multiuser channel, since a single emitter sends to many users, each with a different 
channel. One can of course design a digital communication channel that is geared 
to the worst case situation, but that is somewhat of a waste for the users with 
better channels. For simplicity, consider two classes of users U\ and XJi having 
"good" and "bad" channels, with capacities C\ > C2, respectively. Then, the idea 
is to superimpose information for the users with the good channel on top of the 
information that can be received by the users with the bad channel (which can 
also be decoded by the former class of users ) [66]. Interestingly, this simple idea 
improves the joint capacity of both classes of users over simply multiplexing between 
the two channels (sending information at rate R\ < C\ to U\ part of the time, and 
then at rate i?2 < C2 to U\ and U% the rest of the time). See Figure 7.41(a) for 
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Figure 7.41 Digital broadcast, (a) Joint capacity region for two classes of users 
with channel capacities C\ and C2, respectively, and C\ > Ci- Any point on 
or below the curves is achievable, but superposition outperforms multiplexing, 
(b) Example of a signal constellation (showing amplitudes of cosine and sine 
carriers in a digital communication system) using superposition of information. 
As can be seen, there are four clouds at four points each. When the channel is 
good, 16 points can be distinguished, (or four bits of information) , while under 
adverse conditions, only the clouds are seen (or two bits of information) . 



a graphical description of the joint capacity region and Figure7. 41(b) for a typical 
constellation used in digital transmission where information for the users with better 
channels is superimposed over information which can be received by both classes 
of users. Now, keeping our multiresolution paradigm in mind, it is clear that we 
can send coarse signal information to both classes of users, while superposing detail 
information that can be taken by the users with the good channel. In [231], a 
digital broadcast system for HDTV was designed using these principles, including 
multiresolution video coding [301] and multiresolution transmission with graceful 
degradation (using constellations similar to the one in Figure 7.41(b)). 

The principles just described can be used for transmission over unknown time- 
varying channels. Instead of transmitting assuming the worst case channel, one can 
superpose information decodable on a better channel, in case the channel is actually 
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better than worst case. On average, this will be better than simply assuming worst 
case all the time. As an example, consider a wireless channel without feedback. 
Because of the changing location of the user, the channel can vary greatly, and 
the worst case channel can be very poor. Superposition allows delivery of different 
levels of quality, depending on how good the reception actually is. When there is 
feedback (as in two-way wireless communication), then one can use a channel coding 
optimized for the current channel (see [114]). The source coder then has to adapt to 
the current transmission rate, which again is easy to achieve using multiresolution 
source coding. A study of wireless video transmission using a two resolution video 
source coder can be found in [157]. 

7.5.2 Packet Video 

Another example of application of multiresolution coding for transmission is found 
in real-time services such as voice and video over asynchronous transfer mode 
(ATM) networks. The problem is that packet transmission can have greatly varying 
delays as well as packet losses. However, it is possible to protect certain packets (for 
example, using priorities). Again, the natural idea is to use multiresolution source 
coding and put the coarse approximation into high priority so that it will almost 
surely be received [154] . The detail information is carried with lower priority pack- 
ets and will only arrive when the network has enough resources to carry them. Such 
an approach can lead to substantial improvements over nonprioritized transmission 
[107]. In video compression, this approach is often called layered coding, with the 
layers corresponding to different levels of approximation (typically, two layers are 
used) and different layers having different protections for transmission. 

This concludes our brief overview of multiresolution methods for joint source 
and channel coding. It can be argued that because of increasing interconnectivity 
and heterogeneity, traditional fixed-rate coding and transmission will be replaced 
by flexible multiresolution source coding and multiple or variable-rate transmission. 
For an interface protocol allowing such flexible interconnection, see [127]. The main 
advantage is the added flexibility, which will allow users with different requirements 
to be interconnected through a mixture of possible channels. 

APPENDIX 7. A Statistical Signal Processing 

Very often, a signal has some statistical characteristics of which we can take 
advantage. A full blown treatment of statistical signal processing requires the study 
of stochastic processes [122, 217]. Here, we will only consider elementary concepts 
and restrict ourselves to the discrete-time case. 

We start by reviewing random variables and then move to random processes. 
Consider a real- valued random variable X over 7Z with distribution Px- The dis- 
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tribution Px(A) indicates the probability that the random variable X takes on a 
value in A, where A is a subset of the real line. The cumulative distribution function 
(cdf) Fx is defined as 

Fx{ot) = Px({x\x < a}), a G 1Z. 

The probability density function (pdf) is related to the cdf (assume that Fx is 

differentiable) as 

dF x {a) 

fx{a) = — , a e U, 

da 



and thus 



/oc 
fx(x)dx, a &1Z. 
-oo 



A vector random variable X is a collection of k random variables (Xq, . . . ,Xk-i), 
with a cdf F-% given by 

Fx(ol) = P x ({x\xi < an,i = 0,1,..., k- 1}), 

where a = (ao, • • • , «fc_i). The pdf is obtained, assuming differentiability, as 

Qk 

fx{oi) = F x (a ,ai,...,a fc _i). 

dao,oa>i,... ,oak-i 

A key notion is independence of random variables. A collection of k random vari- 
ables is independent if and only if the joint pdf has the form 

fx x 1 -x k _ 1 (xo,xi,---,Xk-i) = fx (xo) •/x 1 (aJi)---/x fc _ 1 (x fc _i). (7.A.1) 

In particular, if each random variable has the same distribution, then we have an 
independent and identically distributed (iid) random vector. 

Intuitively, a discrete-time random process is the infinite-dimensional general- 
ization of a vector random variable. Therefore, any finite subset of random variables 
from a random process is a vector random variable. 

Example 7.3 Jointly Gaussian Random Process 



An important class of vector random variables is the Gaussian vector random variable 
of dimension k. To define its pdf, we need a length-fc vector m and a positive definite matrix 
A of size k x k. Then, the fc-dimensional Gaussian pdf is given by 

/(x) = (27r)- fe/2 (det A) -i/ V (x - m)TA ~ 1(a; - m)/2 , x £ K k (7.A.2) 
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Note how, for k — 1 and A — a 2 , this reduces to the usual Gaussian (normal) distribution 

/(*) = -=L=. e - ( *- m)2/2CT2 ,*e^, 

V27T(7 2 

of which (7.A.2) is a fc-dimensional generalization. 

A discrete-time random process is jointly Gaussian if all finite subsets of samples 
{X no , X ni , . . . ,X nk _ 1 } are Gaussian random vectors. Thus, a Gaussian random process is 
completely described by m and A, which are called the mean and covariance as we will see. 

For random variables as for random processes, a fundamental concept is that of 
expectation, defined as 

/■OO 

E(X) = / xf x (x) dx. 



Expectation is a linear operator, that is, given two random variables X and Y, 
we have E{aX + bY) = aE(X) + bE(Y). The expectation of products of random 
variables leads to the concept of correlation. Given two random variables X and 
Y, their correlation is E{XY). They are uncorrelated if 

E(XY) = E(X) E(Y). 

From (7.A.1) we see that independent variables are uncorrelated (but uncorrelated- 
ness is not sufficient for independence). Sometimes, the "centralized" correlation, 
or covariance, is used, namely 

cov(X,Y) = E((X - E{X)){Y - E{Y))) 
= E(XY) - E(X)E(Y), 

from which it follows that two random variables are uncorrelated if and only if their 
covariance is zero. The variance of X, denoted by a 2 x , equals cov(X, X), that is, 

u\ = E((X-E(X)) 2 ), 

and its square root ax is called the standard deviation of X. Higher-order moments 
are obtained from E(X k ),k > 2. The above functions can be extended to random 
processes. The autocorrelation function of a process {X n ,n £ Z}, is defined by 

Rx[n, m] = E(X n X m ), n,meZ, 

and the autocovariance function is 

K x [n, m] = cov(A" n , X m ) 

= R x [n,m]- E(X n )E(X m ), n,meZ. 
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An important class of processes are stationary random processes, for which the 
probabilistic behavior is constant over time. In particular, the following then hold: 

E{X n ) = E(X), neZ, (7.A.3) 

a 2 Xn = a 2 x , n e Z. (7.A.4) 

By the same token, all other moments are independent of n. Also, correlation and 
covariance depend only on the difference (n — m), or 

Rx[n,m] = Rx[n — m], n,m£Z, (7.A.5) 

Kx[n,m] = Kx[n — m], n,m£Z. (7.A.6) 

While stationarity implies that the full probabilistic description is time-invariant, 
nth-order stationarity means that distributions and expectations involving n sam- 
ples are time-invariant. The case n = 2, which corresponds to (7. A. 3-7. A. 6) is called 
wide-sense stationarity. An important property of Gaussian random processes is 
that if they are wide-sense stationary, then they are also strictly stationary. 

Often, we are interested in filtering a random process by a linear time-invariant 
filter with impulse response h[n\. That is, the output equals Y[n] = ^2'^L_ 00 h[k] 
X[n — k]. Note that Y[.] and X[.] denote random variables and are thus capitalized, 
while h[.] is a deterministic value. We will assume a stable and causal filter. The 
expected value of the output is 

oo oo oo 

E(Y[n]) = E(^2 h[k]X[n -k]) = ^2 h[k]E(X[n -k]) = J2 h W\m n -k, (7.A.7) 

fc=0 fc=0 fc=0 

where m; is the expected value of X\. Note that if the input is wide-sense stationary, 
that is, E{X n ) = E{X) for all n, then the output has a constant expected value 
equal to E(X) X^fcLo M^l • It can be shown that the covariance function of the output 
depends also only on the difference n — m (as in (7. A. 5)) and thus, filtering by a 
linear time-invariant system conserves wide-sense stationarity (see Problem 7.9). 

When considering filtered wide-sense stationary processes, it is useful to intro- 
duce the power spectral density function (psdf), which is the discrete-time Fourier 
transform of the autocorrelation function 



oo 

E 



Sx(e>") = V Rxln] e-'" n . 



Then, it can be shown that the psdf of the output process after filtering with h[n] 
equals 

S Y (en = \H(enfSx(en, (7.A.8) 
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where H(e JiJ ) is the discrete-time Fourier transform of h[n]. Note that when the 
input is uncor related, that is, iJ^M = E 2 (X)5[n], then the output autocorrelation 
is simply the autocorrelation of the filter, or i?y[7i] = E 2 (X)(h[k], h[k + n\), as can 
be seen from (7. A. 8). If we define the crosscorrelation function 

R XY [m] = E(X[n]Y[n + m}), 

then its Fourier transform leads to 

S XY (e joJ ) = H(e joJ ) S x {e juJ ). (7.A.9) 

Again, when the input is uncorrelated, this can be used to measure H(e 3U) ). 

An important application of filtering is in linear estimation. The simplest linear 
estimation problem is when we have two random variables X and Y, both with zero 
mean. We wish to find an estimate X of the form X = aY from the observation 
Y, such that the mean square error (MSE) E((X — X) 2 ) is minimized. It is easy 
to verify that 

E(XY) 

a ~ E(Y 2 ) ' 

minimizes the expected squared error. One distinctive feature of the MSE esti- 
mate is that the estimation error {X — X) is orthogonal (in expected value) to the 
observation Y, that is, 

E((X - X)Y) = E((X - aY)Y) = E(XY) - aE(Y 2 ) = 0. 

This is known as the orthogonality principle: The best linear estimate in the MSE 
sense is the orthogonal projection of X onto the span of Y. It follows that the 
minimum MSE is 

E((X-X) 2 ) = E(X 2 )-a 2 E(Y 2 ), 

because of orthogonality of {X — X) and Y. This geometric view follows from 
the interpretation of E(XY) as an inner product and thus E(X 2 ) is the squared 
length of the vector X. Similarly, orthogonality of X and Y is seen as E{XY) = 0. 
Based on this powerful geometric point of view, let us tackle a more general linear 
estimation problem. Assume two zero-mean jointly wide-sense stationary processes 
{X[n]} and {Y[n]}. We want to estimate X[n] from Y[n] using a filter with the 
impulse response h[n], that is 

X[n] = ^2h[k]Y[n-k], (7.A.10) 

k 

in such a way that E((X[n] — X[n]) 2 ) is minimized. The range of k is restricted to a 
set K (for example, k > so that only y[n],y[n— 1], . . . are used). The orthogonality 
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principle states that the optimal solution will satisfy 

E((X[n] - X[n])Y[k}) = 0, k e K. 
Using (7. A. 10), we can rewrite the orthogonality condition as 
E(X[n]Y[k}) - E(Y^ h[i]Y[n - i]Y[k}) 

i 

= Rxy[n, k] — y h[i]RY[n — i, k] 

i 

= Rxy[n- k]-^2h[i]RY[n- k -i], k e K, 

i 

where we used wide-sense stationarity in RxY[n, k] = Rxy[n — k]. Replacing n — k 
by I, we get 

R XY [l] = ^2h[i]R Y [l-i], n-leK. (7.A.11) 

i 

In particular, when there is no restriction on the set of samples {5^[n]} used for the 
estimation, that is K = Z, then we can take the Fourier transform of (7. A. 11) to 
find 

which is the optimal linear estimator. Note that this is in general a noncausal 
filter. Finding a causal solution {K = (— oo,n\) is more involved [122], but the 
orthogonality principle is preserved. 

This concludes our brief overview of statistical signal processing. One more 
topic, namely the discrete-time Karhunen-Loeve transform, is discussed in the main 
text, in Section 7.1, since it lays the foundation for transform-based signal compres- 
sion. 
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Problems 

7.1 For a uniform input pdf, as well as uniform quantization, prove that the distortion between 
the input and the output of the quantizer is given by (7.1.14), that is 

A 2 

where A is the quantizer step size A = (b — a)/N, a, b are the boundaries of the input, and 
N is the number of intervals. 

7.2 Coding gain as a function of number of channels: Consider the coding gain of an ideal filter 
bank with N channels (see Section 7.1.2). 

(a) Construct a simple example where the coding gain for a 2-channel system is bigger 
than the coding gain for a 3-channel system. Hint: Construct a piecewise constant 
power spectrum for which the 2-channel system is better matched than the 3-channel 
system. 

(b) For the example constructed above, show that a 4-channel system outperforms both 
the 2- and 3-channel systems. 

7.3 Consider the coding gain (see Section 7.1.2) in an ideal subband coding system with N 
channels (the filters used are ideal bandpass filters). Start with the case N — 2 before 
looking at the general case. 

(a) Assume that the power spectrum of the input signal |X(e-"'')| 2 is given by 

\x(en\ 2 = i- M M<- 

Give the coding gain as a function of N. 

(b) Same as above, but with 

\X{e jw )\ 2 = e _aH M <tt. 

Give the coding gain as a function of N and a, and compare to (a). 

7.4 Huffman and run-length coding: A stream of symbols has the property that stretches of 
zeros are likely. Thus, one can use code the length of the stretch of zeros, after a special 
"start of run" (SR) symbol. 

(a) Assume there are runs of lengths 1 to 8, with probabilities: 

Length ||lJ2|3|4|5J6J 7 | 8 

Probability || 1/2 | 1/4 | 1/8 | 1/16 | 1/32 | 1/64 | 1/128 | 1/128 

Design a Huffman code for the run lengths. How close does it come to the entropy? 

(b) There are 8 nonzero symbols, plus the start of run symbols, with probabilities: 

Symbol || ±1 | ±2 | ±3 | ±4 | SR 

Probability || 0.2 | 0.15 | 0.075 | 0.05 | 0.05 

Design a Huffman code for these symbols. How close does it come to the entropy? 
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(c) As an example, take a typical sequence, including stretches of zeros, and encode it, 
then decode it, with your Huffman code (small example). Can you decode your bit 
stream? 

(d) Give the average compression of this run-length and Huffman coding scheme. 

7.5 Consider a pyramid coding scheme as discussed in Section 7.3.2. Assume a one-dimensional 
signal and an ideal lowpass filter both for coarse-to-fine and fine-to-coarse resolution change. 

(a) Assume an exponentially decaying power spectrum 

|AV")| 2 = e- 3H/7r M <tt. 

Derive the variances of the coarse and the difference channels. 

(b) Assume now that the coarse channel is quantized before being interpolated and used 
as a prediction. Assume an additive noise model, with variance cA 2 where A is the 
quantizer step. Give the variance of the difference channel (which now depends on A, 
or the number of bits allocated to the coarse channel). 

(c) Investigate experimentally the bit allocation problem in a pyramid coder using a 
quantized coarse version for the prediction. That is, generate some correlated random 
process (for example, first-order Markov with high correlation) and process it using 
pyramid coding. Allocate part of the bit budget to the coarse version, and the rest 
for the difference signal. Discuss the two limiting cases, that is, zero bits to the coarse 
version and all the bits for the coarse version. 

7.6 Consider the embedded zero tree wavelet (EZW) transform algorithm discussed in Sec- 
tion 7.3.4, and study a one-dimensional version. 

(a) Assume a one-dimensional octave-band filter bank and define a zero tree for this case. 
Compare to the two-dimensional case. Discuss if the dominant and subordinate passes 
of the EZW algorithm have to be modified, and if so, how. 

(b) One can define a zero tree for arbitrary subband decomposition trees (or wavelet 
packets). In which case is the zero tree most powerful? 

(c) In the case of a full tree subband decomposition in two dimensions (for example, of 
depth 3, leading to 64 channels), compare the zero tree structure with zig-zag scanning 
used in DCT. 

7.7 Progressive to interlaced conversion: 

(a) Verify that the filters given in (7.4.3) form a perfect reconstruction filter bank for 
quincunx downsampling and give the reconstruction filters as well. 

(b) Show that cascading the quincunx decomposition twice on a progressive sequence (on 
the vertical-time dimension) yields again a progressive sequence, with an intermediate 
interlaced sequence. Use the downsampling matrix 
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7.8 Consider a two-channel filter bank for three-dimensional signals (progressive video sequences) 
using FCO downsampling (see Section 7.4.4). 

(a) Consider a lowpass filter 

H (zi,Z 2 ,Z3) = —={l + z 1 z 2 z 3 ), 
V2 

and a highpass filter 

H 1 (z 1 ,z 2 ,z 3 ) — H (-zi,-z 2 ,-z 3 ). 

Show that this corresponds to an orthogonal Haar decomposition for FCO downsam- 
pling. 

(b) Give the output of a two-channel analysis/synthesis system with FCO downsampling 
as a function of the input, the aliased version, and the filters. 

7.9 Filtering of wide-sense stationary processes: Consider a wide-sense stationary process {x[n]} 
and its filtered version y[n] — "^2 k h[k]x[n — k], where h[k] is a stable and causal filter. 

(a) In Appendix 7. A, we saw that the mean of {y[n]} is independent of n (see below 
Equation (7. A. 7)). Show that the covariance function of {?/[".]}, Ky[n, m] — cov(y[n] ■ 
y[m]) is a function of (n — m) only, and given by 

oo oo 

Ky [k] = Yl XI ft W ft H Kx i k -( n - m )} 

n = Om = 

(b) Prove (7. A. 9) in time domain, or assuming zero-mean input, 

oo 

K XY [m] = ^Th[k] K x [m-k]. 

h = 

(c) Consider now one-sided wide-sense stationary processes, which can be thought of as 
wide-sense stationary processes that are "turned on" at time 0. Consider filtering of 
such processes by causal FIR and IIR filters, respectively. What can be said about 
B(F[n]) n > in these cases? 



Projects: The following problems are computer-based projects with an experimental flavor. 
Access to adequate data (images, video) is helpful. 

7.10 Coding gain and R(d) optimal filters for subband coding: Consider a two-band perfect re- 
construction subband coder with orthogonal filters in lattice structure. As an input, use a 
first-order Markov process with high correlation (p = 0.9). For small filter lengths (L — 4,6 
or so), optimize the lattice coefficients so as to maximize coding gain or minimize first-order 
entropy after uniform scalar quantization. Find what filter is optimal, and try for fine and 
coarse quantization steps. 

Use optimal bit allocation between the two channels, if possible. The same idea can be 
extended to Lloyd-Max quantization, and to logarithmic trees. This project requires some 
experience with coding algorithms. For relevant literature, see [79, 109, 244, 295]. 
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7.11 Pyramids using nonlinear operators: One of the attractive features of pyramid coding schemes 
over critically sampled coding schemes is that nonlinear operators can be used. The goal of 
the project is to investigate the use of median filters (or some other nonlinear operators) in 
a pyramidal scheme. 

The results could be theoretical or experimental. The project requires image processing 
background. For relevant literature, see [41, 138, 303, 323]. 

7.12 Motion compensation of motion vectors: In video coding, motion compensation is used to 
predict a new frame from reconstructed previous frames. Usually, a sparse set of motion 
vectors is used (such as one per 8x8 block), and thus, sending motion vectors contributes 
little to the bit rate overhead. An alternative scheme could use a dense motion vector field 
in order to reduce the prediction error. In order to reduce the overhead, predict the motion 
vector field, since it is usually not changing radically in time within a video scene. Thus, 
the aim of the project is to treat the motion vector field as a sequence (of vectors), and find 
a meta-motion vector field to predict the actual motion vector field (for example, per block 
of 2x2 motion vectors). 

This project requires image/video processing background. For more literature on motion 
estimation, see [138, 207]. 

7.13 Adaptive Karhunen-Loeve transform: The Karhunen-Loeve transform is optimal for energy 
packing of stationary processes, and under certain conditions, for transform coding and 
quantization of such processes. However, if the process is nonstationary, compression might 
be improved by using an adaptive transform. An interesting solution is an overhead free 
transform which is derived from the coded version of the signal, based on some estimate of 
local correlations. 

The goal of the project is to explore such an adaptive transform on some synthetic nonsta- 
tionary signals, as well as on real signals (such as speech). 

This project requires good signal processing background. For more literature, see [143]. 

7.14 Three-dimensional wavelet coding: In medical imaging and remote sensing, one often en- 
counters three-dimensional data. For example, multispectral satellite imagery consists of 
many spectral band images. Develop a simple three-dimensional coding algorithm based on 
the Haar filters, and iteration on the lowpass channel. This is the three-dimensional equiv- 
alent of the octave-band subband coding of images discussed in Section 7.3.3. Apply your 
algorithm to real imagery if available, or generate synthetic data with a lowpass nature. 
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multidimensional, 184, 293 
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synthesis filter banks, 108, 114 

time-domain analysis, 113, 122, 129, 167 

tree-structured, 148, 161 

two-channel, 106, 112, 131 

used for construction of wavelets, 246 
filters 

allpass, 68 

Butterworth, 61, 67, 147, 288 

complementary, 145 

Daubechies', 135, 137, 138, 267 

Haar, 105, 141, 150 

infinite impulse response, 145, 148 

linear phase, 140, 142, 144 
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quadrature mirror, 127, 144 
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Smith-Barnwell, 134 

Vaidyanathan and Hoang, 138 
Fourier theory, 1, 37 

best approximation property, 44 
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discrete Fourier transform, 53 

discrete-time Fourier series, 52, 97, 101, 
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discrete-time Fourier transform, 50 

Fourier series, 43, 212 

Fourier transform, 39 

short-time Fourier transform in contin- 
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short-time Fourier transform in discrete 
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frame bounds, 332 
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of short-time Fourier transform, 339 
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time localization of wavelet frames, 338 
frequency localization, 110, 275, 320, 338 

Gabor transform, see Fourier theory — short- 
time Fourier transform in continu- 
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Gram-Schmidt orthogonalization, 23 
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generalization to multiple dimensions, 297 
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linear operators on, 85 
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image compression, 414 
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JPEG standard, 420 
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image database, 9 
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interlaced scanning, 449 

iterated filter banks, see filter banks: used 
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joint source-channel coding, 464 

digital broadcast, 465 
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separation principle, 464 
JPEG image coding standard, 420 
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Laplace transform, 59 

lapped orthogonal transforms, 163 

in image coding, 419 
lattices, 202 
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reciprocal, 202 

separable, 203 
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linear algebra, 29 
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least-squares approximation, 32 
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mean square error, 386 
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motion 

and subband coding, 461 

models, 447 
motion-compensated video coding, 453 
MPEG video compression standard, 463 
multirate operations, 68 
multiresolution, 3, 414, 451 

analysis, 158, 222, 293 

approximation and detail spaces, 158, 
159, 221 

axiomatic definition, 223 

decomposition, 158 

pyramids, 9, 181 

transmission, 465 
MUSICAM, 412 

orthogonal projections, 25 
orthogonality, 21 

orthonormal bases, see orthonormal expan- 
sions 
orthonormal expansions, 23, 97, 100, 150, 
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completeness, 116 

Haar, 104 

periodically time-invariant, 98 

sine, 109 

time-invariant, 110 
overcomplete expansions, 28, 101, 179 
overlap-add/save algorithms, 376 

packet video, 467 

ATM networks, 467 
Parseval's equality, see conservation of en- 
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perceptual coding, 449 

of audio, 409 

of images, 417, 438 

of video, 449 
piecewise Fourier series, 215 
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autocorrelation, 134 

correlation, 119, 141, 144 

cyclotomic, 350 
polyphase transform, 74 
power complementary condition, see conser- 
vation of energy, 133, 177, 180 
predictive quantization, 396 

differential pulse code modulation, 396 
progressive scanning, 449 
pyramids, 179, 181 

bit allocation, 424 

comparison with subband coding for video 
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decimation and interpolation operators, 
423 

in image coding, 421 

in video coding, 454 

oversampling, 424 

quantization noise, 422 

quadrature mirror filters, 127 
quantization, 390 

bit allocation, 397 

coding gain, 401 

error analysis in a subband system, 444 

Lloyd-Max, 393 

of DCT coefficients, 417 

of the subbands, 430 

predictive, 396 

scalar, 390 

uniform, 391 

vector, 393 
quincunx, see lattices: quincunx, subband 

coding: quincunx 
Quintilian, 97 

random processes, see statistical signal pro- 
cessing: random process 

jointly Gaussian, 468 

stationary, 470 

wide-sense stationary, 470 
regularity, 90, 257 

in subband coding, 429 

sufficient condition, 263 
reproducing kernel, 323 
resolution, 78 



run-length coding, 406 

sampling, 47 

theorem, 48, 213 
scalar quantization, 390 

centroid condition, 392 

Lloyd-Max, 393 

nearest neighbor condition, 392 

uniform, 391 
scale, 78 

series expansions, 3 
) block discrete-time Fourier series, 103 

continuous-time, 38, 211 

discrete-time, 38, 100 

discrete-time Fourier series, 52, 101, 102 

Fourier series, 43, 212 

sampling theorem, 49, 213 
Shensa's algorithm, 369 

short-time Fourier transform in continuous 
time, 325 

discretization, 331 

fast algorithm and complexity, 371 

Gaussian window, 327 

properties, 325 
short-time Fourier transform in discrete time 

fast algorithm, 366 
signal-to-noise ratio, 386 
sine expansion, 109, 213, 230, 248 

basis property, 109 

iterated, 160 
Smith and Barnwell filters, 136 
Smith-Barnwell condition, 131 
spectral factorization, 65, 134 
speech compression, 407 

high-quality, 408 

linear predictive coding, 408 

production model, 407 
spline spaces, 238 
statistical signal processing, 467 

correlation, 469 

covariance, 469 

cumulative distribution function, 468 

expectation, 469 

jointly Gaussian random process, 468 

linear estimation, 471 



504 



INDEX 



orthogonality principle, 471 

power spectral density function, 470 

probability density function, 468 

random process, 468 

stationary random processes, 470 

uncorrelatedness, 469 

variance, 469 

wide-sense stationarity, 470 
Stromberg wavelet, 242 
subband coding, 2, 383, 425, 438 

bit allocation, 432 

choice of filters, 428 

comparison with pyramids for video, 462 

entropy coding, 433 

joint design of quantization and filter- 
ing, 445 

nonorthogonal, 446 

nonseparable decompositions, 426 

of images, 425 

of video, 456, 459 

quantization error analysis, 444 

quantization of the subbands, 430 

quincunx, 426 

separable decompositions, 425 
successive approximation, 27, 98 

time localization, 108, 109, 214, 274, 319, 

338 
time-frequency representations, 7, 76 
transmultiplexers, 192 

analysis, 193 

crosstalk, 194 

perfect reconstruction, 194 
two-scale equation, 224, 255, 277, 293 

uncertainty principle, 79 
upsampling, 71, 113, 203 

Vaidyanathan and Hoang filters, 136 
vector quantization, 393 

fractional bit rate, 394 

of subbands, 431 

packing gain, 394 

removal of linear and nonlinear depen- 
dencies, 394 
vector spaces, 18 



video compression, 446 
compatibility, 451 

motion-compensated video coding, 453 
MPEG standard, 463 
perceptual point of view, 449 
progressive/interlaced scanning, 449, 456 
pyramid coding, 454 
three-dimensional subband coding, 459 
transform coding, 448 

wavelet coding, 425, 438 

based on wavelet maximums, 443 

based on zero trees, 438 

best basis algorithm, 442 
wavelet series, 270 

biorthogonal, 282 

characterization of singularities, 275 

fast algorithm and complexity, 369 

frequency localization, 275 

Haar, 216 

Mallat's algorithm, 280 

properties of basis functions, 276 

sine, 230 

time localization, 274 
wavelet theory, 1 

admissibility condition, 313 

basis property of wavelet series, 255 

Battle-Lemarie wavelets, 242 

characterization of singularities, 275 

continuous-time wavelet transform, see 
wavelet transform 

Daubechies' wavelets, 267 

discrete-time wavelet series, 150, 154 

frequency localization, see frequency lo- 
calization, 214, 275 

Haar wavelet, 216, 228, 247 

Meyer's wavelet, 233 

moment properties, 277 

orthogonalization procedure, 240 

regularity, 257 

resolution of the identity, 314 

scaling function, 224 

sine wavelet, 230, 248 

Stromberg wavelet, 242 

time localization, see time localization, 
214, 274 
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two-scale equation, 224, 255, 277, 293 

wavelet, 226 

wavelet packets, 161, 289 

wavelet series, 270 

wavelet transform, 83 
wavelet transform, 313 

admissibility condition, 313 

characterization of regularity, 320 

conservation of energy, 318 

discretization of, 329 

frequency localization, 320 

properties, 316 

reproducing kernel, 323 

resolution of the identity, 314 

scalograms, 325 

time localization, 319 
wavelets 

"twin dragon", 298 

based on Butterworth filters, 288 

based on multichannel filter banks, 289 

Battle-Lemarie, 242 

biorthogonal, 282 

construction of, 226 

Daubechies', 221, 267 

Haar, 216, 228, 247 

Malvar's, 301 

Meyer's, 233 

Morlet's, 324 

mother wavelet, 313 

multidimensional, 293 

sine, 230, 248 

spline, 238 

Stromberg's, 242 

with exponential decay, 288 
Wigner-Ville distribution, 84 
Winograd short convolution algorithms, 350 

^-transform, 62, 117 
zero trees, 438 



